audioFeatureExtractor
Streamline audio feature extraction
Description
audioFeatureExtractor encapsulates multiple audio feature
extractors into a streamlined and modular implementation.
Creation
Description
creates an
audio feature extractor with default property values.aFE = audioFeatureExtractor()
specifies nondefault properties for aFE = audioFeatureExtractor(Name=Value)aFE using one or more name-value
arguments.
Properties
Main Properties
Analysis window, specified as a real vector.
Data Types: single | double
Overlap length of adjacent analysis windows, specified as an integer in the range
[0, numel(Window)).
Data Types: single | double
FFT length, specified as an integer. The default value of []
means that the FFT length is equal to the window length numel(Window).
Data Types: single | double
Input sample rate in Hz, specified as a positive scalar.
Data Types: single | double
Input to spectral descriptors, specified as "linearSpectrum",
"melSpectrum", "barkSpectrum", or
"erbSpectrum".
Spectral descriptors affected by this property are:
The spectrum input to the spectral descriptors is the same as output from the corresponding feature:
For example, if you set SpectralDescriptorInput to
"barkSpectrum", and spectralCentroid to
true, then aFE returns the centroid of the
default Bark
spectrum.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav"); aFE = audioFeatureExtractor(SampleRate=fs, ... SpectralDescriptorInput="barkSpectrum", ... spectralCentroid=true); barkSpectralCentroid = extract(aFE,audioIn);
barkSpectrum using setExtractorParameters, then the nondefault Bark spectrum is the input
to the spectral descriptors. For example, if you call
setExtractorParameters(aFE,"barkSpectrum",NumBands=40), then
aFE returns the centroid of a 40-band Bark spectrum.setExtractorParameters(aFE,"barkSpectrum",NumBands=40)
bark40SpectralCentroid = extract(aFE,audioIn);Data Types: char | string
This property is read-only.
Total number of features output from extract for the current
object configuration, specified as a positive integer.
FeatureVectorLength is equal to the second dimension of the
output from the extract
function.
Data Types: single | double
Features to Extract
Extract the one-sided linear spectrum, specified as true or
false.
To set parameters of the linear spectrum extraction, use setExtractorParameters:
setExtractorParameters(aFE,"linearSpectrum",Name=Value)FrequencyRange–– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified,FrequencyRangedefaults to[0,.SampleRate/2]SpectrumType–– Spectrum type, specified as"power"or"magnitude". If unspecified,SpectrumTypedefaults to"power".WindowNormalization–– Apply window normalization, specified astrueorfalse. If unspecified,WindowNormalizationdefaults totrue.
Data Types: logical
Extract the one-sided mel spectrum, specified as true or
false.
To set parameters of the mel spectrum extraction, use setExtractorParameters:
setExtractorParameters(aFE,"melSpectrum",Name=Value)FrequencyRange–– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified,FrequencyRangedefaults to[0,.SampleRate/2]SpectrumType–– Spectrum type, specified as"power"or"magnitude". If unspecified,SpectrumTypedefaults to"power".NumBands–– Number of mel bands, specified as an integer. If unspecified,NumBandsdefaults to32.FilterBankNormalization–– Normalization applied to bandpass filters, specified as"bandwidth","area", or"none". If unspecified,FilterBankNormalizationdefaults to"bandwidth".WindowNormalization–– Apply window normalization, specified astrueorfalse. If unspecified,WindowNormalizationdefaults totrue.FilterBankDesignDomain–– Domain in which the filter bank is designed, specified as either"linear"or"warped". If unspecified,FilterBankDesignDomaindefaults to"linear".MelStyle–– Style of the mel scale used, specified as either"oshaughnessy"or"slaney". If unspecified,MelStyledefaults to"oshaughnessy".ApplyLog–– Apply base 10 logarithm to the auditory spectrum, specified astrueorfalse. If unspecified,ApplyLogdefaults tofalse.
Data Types: logical
Extract the one-sided Bark spectrum, specified as true or
false.
To set parameters of the Bark spectrum extraction, use setExtractorParameters:
setExtractorParameters(aFE,"barkSpectrum",Name=Value)FrequencyRange–– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified,FrequencyRangedefaults to[0,.SampleRate/2]SpectrumType–– Spectrum type, specified as"power"or"magnitude". If unspecified,SpectrumTypedefaults to"power".NumBands–– Number of Bark bands, specified as an integer. If unspecified,NumBandsdefaults to32.FilterBankNormalization–– Normalization applied to bandpass filters, specified as"bandwidth","area", or"none". If unspecified,FilterBankNormalizationdefaults to"bandwidth".WindowNormalization–– Apply window normalization, specified astrueorfalse. If unspecified,WindowNormalizationdefaults totrue.FilterBankDesignDomain–– Domain in which the filter bank is designed, specified as either"linear"or"warped". If unspecified,FilterBankDesignDomaindefaults to"linear".ApplyLog–– Apply base 10 logarithm to the auditory spectrum, specified astrueorfalse. If unspecified,ApplyLogdefaults tofalse.
Data Types: logical
Extract the one-sided ERB spectrum, specified as true or
false.
To set parameters of the ERB spectrum extraction, use setExtractorParameters:
setExtractorParameters(aFE,"erbSpectrum",Name=Value)FrequencyRange–– Frequency range of the extracted spectrum in Hz, specified as a two-element vector of increasing numbers in the range [0, SampleRate/2]. If unspecified,FrequencyRangedefaults to[0,.SampleRate/2]SpectrumType–– Spectrum type, specified as"power"or"magnitude". If unspecified,SpectrumTypedefaults to"power".NumBands–– Number of ERB bands, specified as an integer. If unspecified,NumBandsdefaults toceil(.hz2erb(FrequencyRange(2))-hz2erb(FrequencyRange(1)))FilterBankNormalization–– Normalization applied to bandpass filters, specified as"bandwidth","area", or"none". If unspecified,FilterBankNormalizationdefaults to"bandwidth".WindowNormalization–– Apply window normalization, specified astrueorfalse. If unspecified,WindowNormalizationdefaults totrue.ApplyLog–– Apply base 10 logarithm to the auditory spectrum, specified astrueorfalse. If unspecified,ApplyLogdefaults tofalse.
Data Types: logical
Extract mel-frequency cepstral coefficients (MFCC), specified as
true or false.
To set parameters of the MFCC extraction, use setExtractorParameters:
setExtractorParameters(aFE,"mfcc",Name=Value)NumCoeffs–– Number of coefficients returned for each window, specified as a positive integer. If unspecified,NumCoeffsdefaults to13.DeltaWindowLength–– Delta window length, specified as an odd integer greater than 2. If unspecified,DeltaWindowLengthdefaults to9. This parameter affects themfccDeltaandmfccDeltaDeltafeatures.Rectification–– Type of nonlinear rectification, specified as"log"or"cubic-root".
The mel-frequency cepstral coefficients are calculated using the melSpectrum.
Data Types: logical
Extract delta of MFCC, specified as true or
false.
The delta MFCC is calculated based on the extracted MFCC. Parameters set on
mfcc affect mfccDelta.
Data Types: logical
Extract delta-delta of MFCC, specified as true or
false.
The delta-delta MFCC is calculated based on the extracted MFCC. Parameters set on
mfcc affect mfccDeltaDelta.
Data Types: logical
Extract gammatone cepstral coefficients (GTCC), specified as
true or false.
To set parameters of the GTCC extraction, use setExtractorParameters:
setExtractorParameters(aFE,"gtcc",Name=Value)NumCoeffs–– Number of coefficients returned for each window, specified as a positive integer. If unspecified,NumCoeffsdefaults to13.DeltaWindowLength–– Delta window length, specified as an odd integer greater than 2. If unspecified,DeltaWindowLengthdefaults to9. This parameter affects thegtccDeltaandgtccDeltaDeltafeatures.
Rectification–– Type of nonlinear rectification, specified as"log"or"cubic-root".
The gammatone cepstral coefficients are calculated using the erbSpectrum.
Data Types: logical
Extract delta of GTCC, specified as true or
false.
The delta GTCC is calculated based on the extracted GTCC. Parameters set on
gtcc affect gtccDelta.
Data Types: logical
Extract delta-delta of GTCC, specified as true or
false.
The delta-delta GTCC is calculated based on the extracted GTCC. Parameters set on
gtcc affect gtccDeltaDelta.
Data Types: logical
Extract spectral centroid, specified as true or
false.
The spectral centroid is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract spectral crest, specified as true or
false.
The spectral crest is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract spectral decrease, specified as true or
false.
The spectral decrease is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract spectral entropy, specified as true or
false.
The spectral entropy is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract spectral flatness, specified as true or
false.
The spectral flatness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract spectral flux, specified as true or
false.
The spectral flux is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
To set parameters of the spectral flux extraction, use setExtractorParameters:
setExtractorParameters(aFE,"spectralFlux",Name=Value)NormType–– Norm type used to calculate the spectral flux, specified as1or2. If unspecified,NormTypedefaults to2.
Data Types: logical
Extract spectral kurtosis, specified as true or
false.
The spectral kurtosis is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract spectral rolloff point, specified as true or
false.
The spectral rolloff point is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
To set parameters of the spectral rolloff point extraction, use setExtractorParameters:
setExtractorParameters(aFE,"spectralRolloffPoint",Name=Value)Threshold–– Threshold of the rolloff point, specified as a scalar in the range (0, 1). If unspecified,Thresholddefaults to0.95.
Data Types: logical
Extract spectral skewness, specified as true or
false.
The spectral skewness is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract spectral slope, specified as true or
false.
The spectral slope is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract spectral spread, specified as true or
false.
The spectral spread is calculated on one of the following spectral representations, as specified by the SpectralDescriptorInput property:
Data Types: logical
Extract pitch, specified as true or
false.
To set parameters of the pitch extraction, use setExtractorParameters:
setExtractorParameters(aFE,"pitch",Name=Value)Method–– Method used to calculate the pitch, specified as"PEF","NCF","CEP","LHS", or"SRH". If unspecified,Methoddefaults to"NCF". For a description of available pitch extraction methods, seepitch.Range–– Range within to search for the pitch in Hz, specified as a two-element row vector of increasing values. If unspecified,Rangedefaults to[50,400].MedianFilterLength–– Median filter length used to smooth pitch estimates over time, specified as a positive integer. If unspecified,MedianFilterLengthdefaults to1(no median filtering).
Data Types: logical
Extract harmonic ratio, specified as true or
false.
Data Types: logical
Extract zero-crossing rate, specified as true or
false.
To set parameters of the zero-crossing rate extraction, use setExtractorParameters:
setExtractorParameters(aFE,"zerocrossrate",Name=Value)Method–– Method for computing the zero-crossing rate, specified as"difference"or"comparison". If unspecified,Method, defaults to"difference". For more information, seezerocrossrate.Level–– Signal level for which the crossing rate is computed, specified as a real scalar.audioFeatureExtractorsubtracts theLevelvalue from the signal and then finds the zero crossings. If unspecified,Leveldefaults to0.Threshold–– Threshold above and below theLevelvalue over which the crossing rate is computed, specified as a real scalar.audioFeatureExtractorsets all the values of the input in the range[–toThreshold,Threshold]0and then finds the zero crossings. If unspecified,Thresholddefaults to0.TransitionEdge— Transitions to include when counting zero crossings, specified as"falling","rising", or"both". If you specify"falling", only negative-going transitions are counted. If you specify"rising", only positive-going transitions are counted. If unspecified,TransitionEdgedefaults to"both".ZeroPositive— Sign convention, specified as a logical scalar. If you specifyZeroPositiveastrue, then0is considered positive. If you specifyZeroPositiveasfalse, thenaudioFeatureExtractorconsiders0,–1, and+1to have distinct signs following the convention of thesignfunction. If unspecified,ZeroPositivedefaults tofalse.
Data Types: logical
Extract short-time energy, specified as true or
false. The short-time energy is computed using
sTE = sum(xbw.^2,1),
where xbw is the buffered and windowed
signal.
Data Types: logical
Object Functions
extract | Extract audio features |
setExtractorParameters | Set nondefault parameter values for individual feature extractors |
info | Output mapping and individual feature extractor parameters |
generateMATLABFunction | Create MATLAB function compatible with C/C++ code generation |
plotFeatures | Plot extracted audio features |
Examples
Read in an audio signal.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");Create an audioFeatureExtractor object that extracts the MFCC, delta MFCC, delta-delta MFCC, pitch, spectral centroid, zero-crossing rate, and short-time energy of the signal. Use a 30 ms analysis window with 20 ms overlap.
aFE = audioFeatureExtractor( ... SampleRate=fs, ... Window=hamming(round(0.03*fs),"periodic"), ... OverlapLength=round(0.02*fs), ... mfcc=true, ... mfccDelta=true, ... mfccDeltaDelta=true, ... pitch=true, ... spectralCentroid=true, ... zerocrossrate=true, ... shortTimeEnergy=true);
Call extract to extract the audio features from the audio signal.
features = extract(aFE,audioIn);
Use info to determine which column of the feature extraction matrix corresponds to the requested pitch extraction.
idx = info(aFE)
idx = struct with fields:
mfcc: [1 2 3 4 5 6 7 8 9 10 11 12 13]
mfccDelta: [14 15 16 17 18 19 20 21 22 23 24 25 26]
mfccDeltaDelta: [27 28 29 30 31 32 33 34 35 36 37 38 39]
spectralCentroid: 40
pitch: 41
zerocrossrate: 42
shortTimeEnergy: 43
Plot the detected pitch over time.
t = linspace(0,size(audioIn,1)/fs,size(features,1)); plot(t,features(:,idx.pitch)) title("Pitch") xlabel("Time (s)") ylabel("Frequency (Hz)")

Plot the zero-crossing rate over time.
plot(t,features(:,idx.zerocrossrate)) title("Zero-Crossing Rate") xlabel("Time (s)")

Plot the short-time energy over time.
plot(t,features(:,idx.shortTimeEnergy)) title("Short-Time Energy") xlabel("Time (s)")

Create an audio datastore that points to audio samples included with Audio Toolbox™.
folder = fullfile(matlabroot,"toolbox","audio","samples"); ads = audioDatastore(folder);
Create an audioFeatureExtractor object to extract the mel spectrum, Bark spectrum, ERB spectrum, and linear spectrum from each audio file. Use the default analysis window and overlap length for the spectrum extraction.
aFE = audioFeatureExtractor(SampleRate=44.1e3, ... melSpectrum=true, ... barkSpectrum=true, ... erbSpectrum=true, ... linearSpectrum=true);
Call extract to extract the features from each audio file in the datastore. Specify SampleRateMismatchRule as "resample" to resample the audio files in the datastore if they do not match 44.1 kHz, the sample rate of the audioFeatureExtractor object. If you have Parallel Computing Toolbox™, specify UseParallel as true to read the files and extract the features in parallel.
specs = extract(aFE,ads,SampleRateMismatchRule="resample",UseParallel=true);Starting parallel pool (parpool) using the 'Processes' profile ... 17-Dec-2024 09:28:59: Job Queued. Waiting for parallel pool job with ID 3 to start ... Connected to parallel pool with 4 workers.
The specs variable is a numFiles-by-1 cell array, where numFiles is the number of files in the datastore. Each element of the cell array is a numHops-by-numFeatures-by-numChannels array, where the number of hops and number of channels depends on the length and number of channels of the audio file, and the number of features is the requested number of features from the audio data.
numFiles = numel(specs)
numFiles = 39
[numHops1,numFeaturesFile1,numChanelsFile1] = size(specs{1})numHops1 = 1053
numFeaturesFile1 = 620
numChanelsFile1 = 1
[numHops2,numFeaturesFile2,numChanelsFile2] = size(specs{2})numHops2 = 1724
numFeaturesFile2 = 620
numChanelsFile2 = 4
Use plotFeatures to visualize audio features extracted with an audioFeatureExtractor object.
Read in an audio signal from a file.
[audioIn,fs] = audioread("Counting-16-44p1-mono-15secs.wav");Create an audioFeatureExtractor object that extracts the gammatone cepstral coefficients (GTCCs) and the delta of the GTCCs. Set the SampleRate property to the sample rate of the audio signal, and use the default values for the other properties.
afe = audioFeatureExtractor(SampleRate=fs,gtcc=true,gtccDelta=true);
Plot the features extracted from the audio signal.
plotFeatures(afe,audioIn)

Algorithms
The audioFeatureExtractor creates a feature extraction pipeline based on
your selected features. To reduce computations, audioFeatureExtractor reuses
intermediary representations and outputs some intermediate representations as features.

For example, to create an object that extracts the centroid of the Bark spectrum, the flux
of the Bark spectrum, the pitch, the harmonic ratio, and the delta-delta of the MFCC, specify
the audioFeatureExtractor as
follows.
aFE = audioFeatureExtractor( ... SpectralDescriptorInput="barkSpectrum", ... spectralCentroid=true, ... spectralFlux=true, ... pitch=true, ... harmonicRatio=true, ... mfccDeltaDelta=true)
aFE =
audioFeatureExtractor with properties:
Properties
Window: [1024×1 double]
OverlapLength: 512
SampleRate: 44100
FFTLength: []
SpectralDescriptorInput: 'barkSpectrum'
Enabled Features
mfccDeltaDelta, spectralCentroid, spectralFlux, pitch, harmonicRatio
Disabled Features
linearSpectrum, melSpectrum, barkSpectrum, erbSpectrum, mfcc, mfccDelta
gtcc, gtccDelta, gtccDeltaDelta, spectralCrest, spectralDecrease, spectralEntropy
spectralFlatness, spectralKurtosis, spectralRolloffPoint, spectralSkewness, spectralSlope, spectralSpread
To extract a feature, set the corresponding property to true.
For example, obj.mfcc = true, adds mfcc to the list of enabled features.
Note
Because audioFeatureExtractor reuses intermediary representations, the
features output from audioFeatureExtractor might not correspond with the
default configuration of features output by corresponding individual feature
extractors.
Extended Capabilities
Usage notes and limitations:
You cannot generate code directly from
audioFeatureExtractor. You can generate C/C++ code from the function returned bygenerateMATLABFunction.Functions returned by
generateMATLABFunctionthat compute an auditory spectrum (mel, Bark, ERB) support optimized code generation using single instruction, multiple data (SIMD) instructions. For more information about SIMD code generation, see Generate SIMD Code from MATLAB Functions for Intel Platforms (MATLAB Coder).zerocrossratecode generation does not support disabling dynamic memory allocation when the input is multichannel.
This function fully supports GPU arrays. For more information, see Run MATLAB Functions on a GPU (Parallel Computing Toolbox).
Version History
Introduced in R2019bThe setExtractorParams object function has been removed. Use
setExtractorParameters instead.
The Normalization parameter of the
melSpectrum, barkSpectrum, and
erbSpectrum features has been removed. Use the
FilterBankNormalization parameter for these features instead.
Use setExtractorParameters to set the ApplyLog parameter of
the melSpectrum, barkSpectrum, and
erbSpectrum features to true to apply a base 10
logarithm to the auditory spectrum.
Using the Normalization parameter of the
melSpectrum, barkSpectrum, and
erbSpectrum issues a warning that it will be removed in a future
release. Use the FilterBankNormalization parameter for these features
instead.
Use setExtractorParameters to set the MelStyle parameter of
the melSpectrum feature to "slaney" to use the
Slaney-style mel scale.
Functions returned by generateMATLABFunction that compute an auditory spectrum (mel, Bark, ERB)
support optimized C/C++ code generation using single instruction, multiple data (SIMD)
instructions.
Use the plotFeatures
object function to visualize extracted audio features.
The audioDelta
function is now used to compute mfccDelta,
mfccDeltaDelta, gtccDelta, and
gtccDeltaDelta. The audioDelta algorithm has a
different startup behavior than the previous algorithm. The default window length used to
compute the deltas has changed from 2 to 9. A delta
window length of 2 is no longer supported.
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom (English)