Implementing speech emotion recognition using CNN
Show older comments
Hello everyone, I'm working on speech emotion recognition. I have a training matrix of size 60,000 x 39 and a label matrix of size 60,000 x 1. I would like to implement a Convolutional Neural Network (CNN). I've tried the code below, but it didn't work well. I believe I may have missed something. Could anyone please help me?
%% CNN
layers = [
imageInputLayer([39 1 1])
convolution2dLayer(3,16,'Padding','same')
reluLayer
fullyConnectedLayer(384) % 384 refers to number of neurons in next FC hidden layer
fullyConnectedLayer(384) % 384 refers to number of neurons in next FC hidden layer
fullyConnectedLayer(7) % refers to number of neurons in next output layer (number of output classes)
softmaxLayer
classificationLayer];
options = trainingOptions('sgdm',...
'MaxEpochs',500, ...
'Verbose',false,...
'Plots','training-progress');
XTrain=reshape(mfcc_matrix_app, [39,1,1,60575]);
test=reshape(mfcc_matrix_app, [39,1,1,30000]);
targetD=categorical(CA);
net = trainNetwork(XTrain,targetD,layers,options);
predictedLabels = classify(net,test);
Accepted Answer
More Answers (1)
young
on 8 Apr 2024
Hi Hamza,
May I ask if you have solved this problem. I am doing a similar SER project. I extract features like MFCC, Mel, Pitch and Intensity. Then, I got a traing matrix of size 936x1 cell (200x1 double in every cell) and a label matrix of size 1x936 categorical.
The .mat is attached
My 1dCNN code is like below.
load('audio_features.mat', 'X_train', 'X_test', 'y_train', 'y_test');
x_traincnn = num2cell(X_train, 2);
y_traincnn = categorical(y_train.');
x_testcnn = num2cell(X_test, 2);
y_testcnn = categorical(y_test.');
x_traincnn = cellfun(@(x) x', x_traincnn, 'UniformOutput', false);
x_testcnn = cellfun(@(x) x', x_testcnn, 'UniformOutput', false);
disp(size(x_traincnn));
disp(size(x_testcnn));
disp(size(y_traincnn));
disp(size(y_testcnn));
numFeatures = 200;
numClasses = numel(categories(y_train));
filterSize = 5;
numFilters = 32;
rng('default');
layers = [ ...
sequenceInputLayer(numFeatures)
convolution1dLayer(filterSize,numFilters,Padding="causal")
reluLayer
layerNormalizationLayer
convolution1dLayer(filterSize,2*numFilters,Padding="causal")
reluLayer
layerNormalizationLayer
globalAveragePooling1dLayer
fullyConnectedLayer(numClasses)
softmaxLayer
classificationLayer];
miniBatchSize = 27;
options = trainingOptions("adam", ...
MaxEpochs=200, ...
InitialLearnRate=0.01, ...
SequencePaddingDirection="left", ...
ValidationData={x_testcnn,y_testcnn}, ...
Plots="training-progress", ...
Verbose=0);
net = trainNetwork(x_traincnn, y_traincnn, layers, options);
YPred = classify(net,x_testcnn, ...
SequencePaddingDirection="left");
acc = mean(YPred == y_testcnn);
disp(["Accuracy: ", acc]);
confMat = confusionmat(y_testcnn, YPred);
disp(confMat);
figure;
confusionchart(y_testcnn,YPred);
The output is :


Categories
Find more on AI for Audio in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!