silence period of audio still got values in MFCC,but it shouldn't be like this.
    5 views (last 30 days)
  
       Show older comments
    
Hi there! I am planning to extract the timbre fearture of audio using MFCC function in Matlab. there is 1s silence at the beginingand  and the end of the targeted audio(4s in total).The function goes well except for that the silence period also get specific values for each coefficients which I suppose should be 0. I don't know the reason and resolution for this.
I know that I could just delete the silence area before carring out MFCC, but the 300 audio files I'm going to deal with have different length with each other. Some of them have 0.5s silence period at the end, some have 0.7s, etc. 
So I am wondering whether there are better solutions for this problem.
Thanks very much!
0 Comments
Answers (1)
  Brian Hemmat
    
 on 6 May 2024
        Hi Elaine,
Depending on what you're doing with this, removing the silence may not be necessary. A lot of machine learning models can handle that kind of "noise" and just ignore it--unless the amount of silence is correlated to the type of audio you're analyzing.
Even if not doing speech, the detectSpeech function will probably give reasonable start and end points to your region of interest. For example:
[audioIn,fs] = audioread('foo.wav');
% Call detectSpeech to get the beginning and end samples of a speech region
% (will probably work OK for lots of types of audio)
roi = detectSpeech(audioIn,fs);
% Remove the silence.
audioIn = audioIn(roi(1):roi(end));
% Extract MFCC.
featuresOut = mfcc(audioIn,fs);
Another option would be to use short-time energy. You can do that before calculating the mfcc, or at the same time while using audioFeatureExtractor, as in the sketch below.
[audioIn,fs] = audioread('foo.wav');
% Extract MFCC and short-time energy
afe = audioFeatureExtractor(mfcc=true,shortTimeEnergy=true,SampleRate=fs);
featuresOut = extract(afe,audioIn);
% Remove MFCC that correspond to silent regions
idx = info(afe);
threshold = 0.2; % set empirically based on your dataset
featuresOut(idx.shortTimeEnergy<threshold,:) = []; 
Whatever method you choose, if the end goal is some kind of machine learning, make sure to mimic the same steps for inference.
See Also
Categories
				Find more on Audio I/O and Waveform Generation in Help Center and File Exchange
			
	Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!
