Project Links
Related Information
|
Introduction
A module is a content analysis function that work either on
information provided by tier 1 or on results of another module.
Tier 2 provides data structure used by modules and an interface to
dynamically load and call a module from a library. The goal of a
module is to extract higher level content analysis features based
on low-level features.
A set of fundamental modules is distributed with Maaate as a
set of default modules in a separate library. The features are
described in the CSIRO Technical
Report No.196, December 2001.
If you would like to add new modules or libraries to Maaate, read
the module guide.
Maaate Fundamental Modules List
The Maaate Fundamental Modules Library contains the following
modules:
- Pre-processed features:
These modules simply give access to the pre-processed
features of the SOUNDfile class.
- Energy features:
Signal energy features are closely related to the human
loudness perception. The disadvantage in the frequency domain is
that it is spread over subbands which have to be added up,
requiring higher computation than in the time domain. Signal
energy features are often used for segmentation of an audio
stream.
- Bandwidth features:
Bandwidth may be used for speech/music recognition as speech
usually has a narrower bandwidth than music.
- Spectral energy statistics:
Spectral energy statistics capture the subband energy
distribution, which is indicative for specific types of sound.
- Silence statistics:
Silence statistics are often used as indicators for
classification of audio segments into different signal classes.
Speech segments for example generally contain a lot more silence
than music segments.
- Noise statistics:
Noise statistics may be used to detect sounds of
comparatively high loudness such as explosions or crowd cheers.
- Segmentation
- Others:
Some convenient statistical functions are also available.
Legend
In the formulae, all indexes start at number 0. The window
index start at 0 and covers the whole file.
- n: time index 0 <= n <= N-1
- N: duration of file
- i: subband index 0 <= i <= I-1
- I: number of subbands
- Si(n): is the value of subband i,
at time n. In the following formulae, Si(n)
is used as a placeholder and any of the pre-processed
values may be used in the formulae interchangeably.
- m: time position within window 0 <= m <= M-1
- M: number of time division (frequency samples) in one
window, i.e. the window size.
- o: overlap of successive analysis windows
- t: analysis window number 0 <= t <= (N-1)/(M-o)
- h(m): window function for window m; may be one of:
Rectangular, Hamming, Welch, Barlett; depending on the
required narrowess or peakness of spectral leakage.
- Ts: threshold.
Normalised Subband Energies
Description:
Give an energy spectrum, and allow to observe the spreading of
energy over subbands. The normalisation absorbs the sound level
dependency on an audio source.
Resolution: Window.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Based on the Subband Mean.
Formula:
Modules List Symbols
Legend
Normalised Subband Values Energies
Description:
Give an energy spectrum, and allow to observe the spreading of
energy over subbands. The normalisation absorbs the sound level
dependency on an audio source.
Resolution: Frequency sample.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Based on the Subband Values Mean.
Formula:
Modules List Symbols
Legend
Subband Scalefactors (Channels 0)
Description:
It returns a spectrum giving the maximum of the frequency
samples over each window for each selected subbands.
Resolution: Window.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
For MPEG layer 1 and 2 files, it directly uses the scalefactor
values as an approximation of the maximum.
Formula:
Modules List Symbols
Legend
Subband Mean
Description:
Give a spectrum (this provides a way to go from a frequency
sample resolution to a window resolution).
Resolution: Window.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Based on the Subband Values Mean.
Formula:
Modules List Symbols
Legend
Subband RMS
Description:
Return a spectrum. The Root Mean Squared subband vector is a
better measure of signal energy than the subband values
themselves.
Resolution: Window.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Based on the Subband Values Mean.
Formula:
Modules List Symbols
Legend
Subband Values
Description:
Give a spectrum of the raw subband values directly extracted
from the analysed file for channel 0.
Resolution: Frequency sample.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Modules List
Subband Values Mean
Description:
Give a spectrum of the mean of subband values directly extracted
from the analysed file over channels.
Resolution: Frequency sample.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Based on the Subband Values.
Modules List
Subband Values RMS
Description:
Give a spectrum of the RMS of subband values directly extracted
from the analysed file over channels.
Resolution: Frequency sample.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Based on the Subband Values.
Modules List
Band Energy
Description:
An approximation of the signal energy over the selected subbands.
Resolution: Window.
Parameters: SOUNDfile, start time, end time, start subband, end
subband, window number.
Based on the Subband Values Mean.
Formula:
Modules List Symbols
Legend
Signal Energy
Description:
An approximation of the signal energy.
Resolution: Window.
Parameters: SOUNDfile, start time, end time, window number.
Based on the Subband Values Mean.
Formula:
Modules List Symbols
Legend
Signal Magnitude
Description:
Approximation of the perceptual loudness less sensitive to noise
than Signal Energy.
Resolution: Window.
Parameters: SOUNDfile, start time, end time, window number.
Based on the Subband Values Mean.
Formula:
Modules List Symbols
Legend
Sum of Scalefactors
Description:
Provides a faster approximation of the Signal
Magnitude.
Resolution: Window.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Based on the Scalefactors.
Formula:
Simply sum the scalefactors over a whole window.
Modules List
Bandwidth
Description:
Return the bandwidth, i.e. the frequency min and max using a
threshold.
Resolution: Window.
Parameters: SOUNDfile, start time, end time, Ts
dynamic threshold.
Based on the Subband Mean.
Formula:
Modules List Symbols
Legend
Signal Bandwidth
Description:
Return the bandwidth from a SegmentData, i.e. the frequency min
and max using a threshold.
Resolution: Window.
Parameters: SegmentData, start time, end time, Ts
threshold.
Formula:
Modules List Symbols
Legend
Significant Subbands
Description:
Count the number of subbands with a significant level (higher
than the threshold).
Resolution: Window.
Parameters: SegmentData, start time, end time, Ts
threshold.
Based on the Subband Mean.
Formula:
Modules List Symbols
Legend
Band Energy Ratio
Description:
It sets the energy of the low frequencies in relation to the
high frequencies. This is an indicator of the voicedness of a
sound.
Resolution: window.
Parameters: SOUNDfile, start time, end time, J subband boundary,
window function number.
Based on the Subband Mean.
Formula:
Modules List Symbols
Legend
Central Moment
Description:
It calculates statistics within subbands over several windows.
It captures how much a subband's energy is dispersed from its
mean.
Resolution: several windows.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number, duration of the analysis window, k
parameter.
Based on the Subband Mean.
Formula:
Modules List Symbols
Legend
Spectral Centroid
Description:
The spectral centroid is the balancing point of the subband
energy distribution. It determines the frequency area around
which most of the signal energy concentrates and is thus closely
realted to the time-domain Zero Crossing Rate feature. It is
also frequently used as an approximation for a perceptual
brightness measure.
Resolution: window.
Parameters: SOUNDfile, start time, end time, start subband
number, end subband number.
Based on the Subband RMS.
Formula:
Modules List Symbols
Legend
Spectral Flux
Description:
It determines changes of spectral energy distribution of two
successive windows
Resolution: window.
Parameters: SOUNDfile, start time, end time.
Based on the Subband RMS.
Formula:
Modules List Symbols
Legend
Spectral Rolloff
Description:
The spectral RollOff point R determines where 85% of the
window`s energy is achieved. It is used to distinguish voiced
from unvoiced speech and music.
Resolution: window.
Parameters: SOUNDfile, start time, end time.
Based on the Subband RMS.
Formula:
Modules List Symbols
Legend
Low Energy
Description:
It is the percentage of windows that have less than the average
energy. it is used to separate speech from music.
Resolution: several windows.
Parameters: SegmentData, start time, end time, duration.
Shall use the Signal Energy.
Formula:
Modules List Symbols
Legend
Pause Rate
Description:
The pause rate counts the number of silence segment on a time
interval corresponding to a window. It is proposed as an
indicator to separate speech from non-speech signals.
Resolution: several windows.
Parameters: SegmentData, start time, end time, Ts
dynamic threshold, duration.
Shall use the Signal Energy.
Formula:
Modules List Symbols
Legend
Silences
Description:
It calculates relatively silent intervals from a loudness
contour.
Parameters: SegmentData, start time, end time, threshold,
minimum duration, maximum interrupt, onset time, release time.
Based on Segmentation.
Shall use Sum of Scalefactors as loudness
contour.
Modules List
Background Noise Level
Description:
It calculates the level of background noise using Silence
Segmentation to find that threshold at the first silence
segmentation result.
Parameters: SegmentData, start time, end time, minimum duration,
maximum interrupt, onset time, release time.
Based on Segmentation.
Shall use Sum of Scalefactors as input.
Modules List
Noise
Description:
It calculates relatively noise intervals from a loudness
contour.
Parameters: SegmentData, start time, end time, threshold,
minimum duration, maximum interrupt, onset time, release time.
Based on Segmentation.
Shall use Sum of Scalefactors as loudness
contour.
Modules List
Segmentation
Description:
Using an adaptative thresholding, windows are determined as
silence if there sum of scalefactors stays
under the threshold. A sequence of silence windows gets
clustered into a pause segment if it covers at least the minimum
duration and is not interrupted by non-silence windows longer
than the tolerated interruption.
Parameters: SegmentData, start time, end time, below, threshold,
minimum duration, maximum interrupt, onset time, release time.
Based on the Sum of Scalefactors.
Modules List
Histogramme 1D
Description:
Show the repartition of the values of a curve.
Parameters: SegmentData, start time, end time, bins, start histo,
end histo.
Modules List
Variance
Description:
Computes the variance of the input on each interval of
`duration` seconds.
Parameters: SegmentData, start time, end time, duration.
Modules List |