Clear Filters
Clear Filters

Issues with LSTM prediction due to normalization Layer settings

31 views (last 30 days)
Hello, i've recently tried to create a LSTM-Seq2Seq Model using Multiple-Input Multiple-Output data. It's simulation data, so timesteps are correlated and i use 'sequence' as output mode for all LSTM cells. I've had a look at the tutorial cases and my situation most closely resembles the turbofan tutorial case. https://www.mathworks.com/help/deeplearning/ug/sequence-to-sequence-regression-using-deep-learning.html
I tried both manual normalization and the sequenceInputLayer options. In the latter case there are issues with the prediction. I'll attach what i'm doing as code below (i left out the sequence sorting, which can be found in the turbofan tutorial). This code uses a noisy linear trend for training and validation instead of my real data. I'll attach some prediction plots using the actual data. I've confirmed that the same issue from my code is reproduced in the example below.
The alternative to the code below is to do all steps the same, but instead use
net = trainNetwork(XTrain,YTrain,Layers,options);
for training and
sequenceInputLayer(numFeatures,"Normalization","rescale-symmetric")
as Input Layer. Finally, the plots can be created with:
PYVal = predict(net,XVal,'MiniBatchSize',1);
i=1;
% plot validation targets and predictions
tiledlayout(1,numResponses,"TileSpacing","tight");
figure
for j=1:numResponses
nexttile
TargetY = YVal{i}(j,:);
PredictionY = PYVal{i}(j,:);
plot(TargetY,'k')
hold on
plot(PredictionY,'b')
hold on
legend('YVal','PYVal')
hold off
%Remove axis to fit more plots:
set(gca,'xtick',[],'ytick',[])
set(gca,'xtick',[],'ytick',[])
end
Can someone explain to me what is not working correctly with the SequenceInputLayer options and how to fix it?
Now the code (can't run the code in preview so maybe you have to run it locally):
Note: I edited this post. It occured to me that providing you with an example using completly randomly distributed training and validation data doesn't provide the ANN with any valueable learnable patterns.
% Define parameters
numSequences = 3;
numFeatures = 4;
numResponses = 5;
numTimesteps = 100;
Interval = [0.1 0.9];
StartTimesteps = round(numTimesteps*Interval(1),1)
EndTimesteps = round(numTimesteps*Interval(2),-1)
% Initialize cells
XTrain = cell(numSequences,1);
YTrain = cell(numSequences,1);
XVal = cell(numSequences,1);
YVal = cell(numSequences,1);
% Initialize normalized cells
NX_Train = cell([numSequences 1]);
NY_Train = cell([numSequences 1]);
NX_Val = cell([numSequences 1]);
NY_Val = cell([numSequences 1]);
% Initialize Help Variables
HelpXT = zeros(numFeatures,numTimesteps);
HelpXV = zeros(numFeatures,numTimesteps);
HelpYT = zeros(numResponses,numTimesteps);
HelpYV = zeros(numResponses,numTimesteps);
% Fill Input Data with a noisy function
for s=1:numSequences
for i=1:numFeatures
for j=1:StartTimesteps
HelpXT(i,j) = randn+i*5;
HelpXV(i,j) = randn+i*5;
end
for j=StartTimesteps:EndTimesteps
HelpXT(i,j) = randn+i*5+ 0.2*j;
HelpXV(i,j) = randn+i*5+ 0.2*j;
end
for j=EndTimesteps:numTimesteps
k = 0.2*EndTimesteps;
HelpXT(i,j) = randn+i*5+k;
HelpXV(i,j) = randn+i*5+k;
end
end
XTrain{s} = HelpXT;
XVal{s} = HelpXV;
end
clear k
% Fill Output Data with noisy linear trend
for s=1:numSequences
for i=1:numResponses
for j=1:StartTimesteps
HelpYT(i,j) = randn+i*5;
HelpYV(i,j) = randn+i*5;
end
for j=StartTimesteps:EndTimesteps
HelpYT(i,j) = randn+i*5+ 0.2*j;
HelpYV(i,j) = randn+i*5+ 0.2*j;
end
for j=EndTimesteps:numTimesteps
k = 0.2*EndTimesteps;
HelpYT(i,j) = randn+i*5+k;
HelpYV(i,j) = randn+i*5+k;
end
end
YTrain{s} = HelpYT;
YVal{s} = HelpYV;
end
clear k
% Normalize the first dataset
[NX_Train{1},SX_Train] = mapminmax(XTrain{1});
[NY_Train{1},SY_Train] = mapminmax(YTrain{1});
% Normalize all remaining datasets using the same options
for i=2:numel(XTrain)
NX_Train{i} = mapminmax('apply',XTrain{i},SX_Train);
NY_Train{i} = mapminmax('apply',YTrain{i},SY_Train);
end
for i=1:numel(XVal)
NX_Val{i} = mapminmax('apply',XVal{i},SX_Train);
NY_Val{i} = mapminmax('apply',YVal{i},SY_Train);
end
% Define network options:
numHiddenUnits = 3;
miniBatchSize = 1;
% Define network architecture
Layers = [ ...
sequenceInputLayer(numFeatures)
lstmLayer(numHiddenUnits,'OutputMode','sequence')
dropoutLayer(0.5,"Name",'dropout')
lstmLayer(numHiddenUnits,'OutputMode','sequence')
dropoutLayer(0.5,"Name",'dropout_2')
fullyConnectedLayer(numResponses)
regressionLayer];
% Define training options
maxEpochs = 100;
InitialLearnRate = 1e-2;
Shuffle = 'every-epoch';
Plots = 'training-progress';
GradientThreshold = 1;
Verbose = 0;
ValidataionData ={NX_Val, NY_Val};
ValidationFrequency = 1;
OutputNetwork = 'best-validation-loss';
L2Regularization = 0.05;
% Save training options
options = trainingOptions('adam' ,...
'MaxEpochs',maxEpochs ,...
'MiniBatchSize',miniBatchSize ,...
'InitialLearnRate',InitialLearnRate ,...
'GradientThreshold',GradientThreshold ,...
'Shuffle', Shuffle ,...
'Plots',Plots ,...
'Verbose', Verbose ,...
'ValidationData', ValidataionData ,...
'validationFrequency',ValidationFrequency ,...
'OutputNetwork',OutputNetwork ,...
'L2Regularization',L2Regularization );
% Train the network
net = trainNetwork(NX_Train,NY_Train,Layers,options);
% Predict with the network on the validation data
PN_YVal = predict(net,NX_Val,'MiniBatchSize',1);
% initialize renormalized values
A = cell(size(XTrain,1),1); % XTrain
B = cell(size(XTrain,1),1); % YTrain
C = cell(size(XTrain,1),1); % XVal
D = cell(size(XTrain,1),1); % YVal
E = cell(size(XTrain,1),1); % PYVal
% renormalize data
% you can compare elements of A with XTrain, etc., as sanity check
for i=1:size(XTrain,1)
A{i} = mapminmax('reverse',NX_Train{i},SX_Train);
B{i} = mapminmax('reverse',NY_Train{i},SY_Train);
C{i} = mapminmax('reverse',NX_Val{i},SX_Train);
D{i} = mapminmax('reverse',NY_Val{i},SY_Train);
E{i} = mapminmax('reverse',PN_YVal{i},SY_Train);
end
% Which sequence to plot
i=1;
% plot validation targets and predictions
tiledlayout(1,numResponses,"TileSpacing","tight");
figure
for j=1:numResponses
nexttile
TargetY = D{i}(j,:);
PredictionY = E{i}(j,:);
plot(TargetY,'k')
hold on
plot(PredictionY,'b')
hold on
legend('YVal','PYVal')
hold off
%Remove axis to fit more plots:
set(gca,'xtick',[],'ytick',[])
set(gca,'xtick',[],'ytick',[])
end

Accepted Answer

Neha
Neha on 12 Oct 2023
Hi Patrick,
I understand that you are facing issues while normalizing the training data for LSTM and you are not getting the correct predictions when sequenceInputLayer normalization options are used instead of using mapminmax. So when you were using mapminmax, you were scaling both X and Y data but when you tried normalizing at the input layer you must have normalized only the input data.
So you can normalize the output data using mapminmax and then rescale the input data at the seqenceInputLayer.
[NY_Train{1},SY_Train] = mapminmax(YTrain{1});
for i=2:numel(XTrain)
NY_Train{i} = mapminmax('apply',YTrain{i},SY_Train);
end
for i=1:numel(XVal)
NY_Val{i} = mapminmax('apply',YVal{i},SY_Train);
end
By specifying max and min, the normalization is analogous to mapminmax, but it's not mandatory, specifying only the type of normalization is sufficient.
sequenceInputLayer(numFeatures, "Normalization","rescale-symmetric", "Max", max(XTrain{1},[],2),"Min", min(XTrain{1},[],2))
In general, it is not necessary to normalize the output data in an LSTM network. The normalization of the output data depends on the specific task and the range of values the output can take. If the output values have a wide range or are continuous, it may be beneficial to apply normalization techniques. This can help in cases where the output values have high variances or are sensitive to scale differences.
Hope this helps!
  1 Comment
Patrick Sontheimer
Patrick Sontheimer on 18 Oct 2023
Hello Neha,
thank you for your answer, it's much appreciated. I think you're right. Additionally, i used several sequences and so far only normalized with the first sequences max and min values as a stopgap measure. From now on i'll normalize all values without the sequencelayer options and i'll calculate the min and max values from all combined sequences both for the input and the output. Scale difference is a concern for the outputs. I've asked around and been told i might appreciate using a custom loss function to adress the issue, as it can evaluate differently sized outputs with different prioritization and removes the need to normalize the output, which makes the RMSE interpretable again (same dimension as output values). I still have issues with the accuracy of my Models, but i think the original question has been adressed.

Sign in to comment.

More Answers (0)

Categories

Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange

Products


Release

R2022b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!