Advice for Audio classifier based on Voice Activity Detection

2 views (last 30 days)
I am writting a program to classify recorded audio phone calls files (wav) which contain atleast some Human Voice or Non Voice (only DTMF, Dialtones, ringtones, noise). I tried implementing simple VAD (voice activity detector) using ZCR (zero crossing rate) & calculating Energy, but these parameters confuse with DTMF, Dialtones files with Voice.
I also tried implementing a machine learning based approach using SVM (Support Vector Machine) and MFCC coefficients. The results were worse than previous approach.
I need someone to advice me little on this domain, I have no previous experience in machine learning or AI. I am willing to put in good amount of time in this domain.
I am comfortable working in MATLAB, scipy, numpy, scikit-learn, python.
Thank you
  1 Comment
Md Sahidullah
Md Sahidullah on 4 Jun 2015
Hi! You can try some unsupervised technique. For speech and non-speech discrimination, I have found Bi-Gaussian modeling is very much effective, especially for noisy environment for speaker recognition.
You can even try some different clustering approaches with MFCC as the front-end for the classification of your audio segments.
Hope it helps. Thanks Sahid

Sign in to comment.

Answers (0)

Categories

Find more on AI for Audio in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!