Low LSTM Accuracy in Speech Recognition

Question

Hamza on 31 Oct 2023

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/2041026-low-lstm-accuracy-in-speech-recognition

Commented: Christopher McCausland on 6 Nov 2023

Hello everyone, I am applying LSTM to speech emotion recognition. I have performed feature extraction using MFCC, resulting in a matrix of dimensions 60,575 × 39. I subsequently transformed this matrix into a cell array named "AllCellTrain" with dimensions 280 × 1, containing signals of varying sizes, as illustrated in the image below. I then utilized "AllCellTrain" as input for the trainNetwork function, along with the labels YCA, network layers, and training options. However, I encountered a significant issue with accuracy, achieving only around 20%. I'm unsure where I may have made a mistake. Could someone please offer some assistance?

 num_hidden_units = 1024;
layers = [
    sequenceInputLayer(num_features)
    lstmLayer(num_hidden_units, 'OutputMode', 'last')
    fullyConnectedLayer(num_classes)
    softmaxLayer
    classificationLayer];
% Specify the training options
    max_epochs = 36;
    mini_batch_size = 28;
    initial_learning_rate = 0.001;
options = trainingOptions('adam', ...
    'MaxEpochs', max_epochs, ...
    'MiniBatchSize', mini_batch_size, ...
    'InitialLearnRate', initial_learning_rate, ...
    'SequenceLength','shortest', ...
    'Shuffle','every-epoch',...
    'ExecutionEnvironment','gpu', ...
    'Verbose', false, ...
    'Plots','training-progress');
net = trainNetwork(AllCellTrain, YCA, layers, options);
predicted_labels = classify(net, AllCellTest,'ExecutionEnvironment','gpu');
acc = mean(predicted_labels == YCT)

4 Comments
Show 2 older commentsHide 2 older comments

Hamza on 6 Nov 2023

Edited: Hamza on 6 Nov 2023

Hi @Christopher McCausland , thanks for your answer, I ma trying to classify 7 emotion classes, for your information I have used the same data on 1D CNN and got 90% accuracy, didnt know the issue on LSTM, also when I shufflued the colunms "the features" I got diffrent result, which souldnt be the case. you find the attached curve! thanks in advance

Christopher McCausland on 6 Nov 2023

Hi @Hamza,

To me this looks like classic overfitting, your model appears to train well and learn features, however these features are overfitted to the training data, and are not representative of genralised data.

A few things to consider;

Do you have multiple speakers? If so, how do you pick which speakers are in the test/train set.
You have 280 input sequences, and seven classes, if the data is perfectly ballanced you have 40 observations per class, is this enough?
Can you include a validation split to prevent overfitting?
These are just a few ways to prevent overfitting/ ensure your data is appropreate for training, there are many other which I would suggest you take a look at.

In terms of the CNN preformance, were the test/train set the same and how many epochs did you train the CNN for?

Sign in to comment.

Sign in to answer this question.

Low LSTM Accuracy in Speech Recognition

4 Comments
Show 2 older commentsHide 2 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Low LSTM Accuracy in Speech Recognition

4 Comments Show 2 older commentsHide 2 older comments

Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

4 Comments
Show 2 older commentsHide 2 older comments