Issues with LSTM prediction due to normalization Layer settings

Question

Patrick Sontheimer on 22 Sep 2023

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/2024467-issues-with-lstm-prediction-due-to-normalization-layer-settings

Commented: Patrick Sontheimer on 18 Oct 2023

Hello, i've recently tried to create a LSTM-Seq2Seq Model using Multiple-Input Multiple-Output data. It's simulation data, so timesteps are correlated and i use 'sequence' as output mode for all LSTM cells. I've had a look at the tutorial cases and my situation most closely resembles the turbofan tutorial case. https://www.mathworks.com/help/deeplearning/ug/sequence-to-sequence-regression-using-deep-learning.html

I tried both manual normalization and the sequenceInputLayer options. In the latter case there are issues with the prediction. I'll attach what i'm doing as code below (i left out the sequence sorting, which can be found in the turbofan tutorial). This code uses a noisy linear trend for training and validation instead of my real data. I'll attach some prediction plots using the actual data. I've confirmed that the same issue from my code is reproduced in the example below.

The alternative to the code below is to do all steps the same, but instead use

net = trainNetwork(XTrain,YTrain,Layers,options);

for training and

sequenceInputLayer(numFeatures,"Normalization","rescale-symmetric")

as Input Layer. Finally, the plots can be created with:

PYVal = predict(net,XVal,'MiniBatchSize',1);
i=1;
% plot validation targets and predictions
tiledlayout(1,numResponses,"TileSpacing","tight");
figure
for j=1:numResponses
    nexttile
    TargetY     =  YVal{i}(j,:);
    PredictionY = PYVal{i}(j,:);
    plot(TargetY,'k')
    hold on
    plot(PredictionY,'b')
    hold on
    legend('YVal','PYVal')
    hold off
    %Remove axis to fit more plots:
    set(gca,'xtick',[],'ytick',[])
    set(gca,'xtick',[],'ytick',[])
end

Can someone explain to me what is not working correctly with the SequenceInputLayer options and how to fix it?

Now the code (can't run the code in preview so maybe you have to run it locally):

Note: I edited this post. It occured to me that providing you with an example using completly randomly distributed training and validation data doesn't provide the ANN with any valueable learnable patterns.

% Define parameters
numSequences = 3;
numFeatures = 4;
numResponses = 5;
numTimesteps = 100;
Interval = [0.1 0.9];
StartTimesteps = round(numTimesteps*Interval(1),1)
EndTimesteps = round(numTimesteps*Interval(2),-1)
% Initialize cells
XTrain = cell(numSequences,1);
YTrain = cell(numSequences,1);
XVal = cell(numSequences,1);
YVal = cell(numSequences,1);
% Initialize normalized cells
NX_Train = cell([numSequences 1]);
NY_Train = cell([numSequences 1]);
NX_Val   = cell([numSequences 1]);
NY_Val   = cell([numSequences 1]);
% Initialize Help Variables
HelpXT = zeros(numFeatures,numTimesteps);
HelpXV = zeros(numFeatures,numTimesteps);
HelpYT = zeros(numResponses,numTimesteps);
HelpYV = zeros(numResponses,numTimesteps);
% Fill Input Data with a noisy function 
for s=1:numSequences
    for i=1:numFeatures
        for j=1:StartTimesteps
            HelpXT(i,j) = randn+i*5;
            HelpXV(i,j) = randn+i*5;
        end
        for j=StartTimesteps:EndTimesteps
            HelpXT(i,j) = randn+i*5+ 0.2*j;
            HelpXV(i,j) = randn+i*5+ 0.2*j;
        end
        for j=EndTimesteps:numTimesteps
            k = 0.2*EndTimesteps;
            HelpXT(i,j) = randn+i*5+k;
            HelpXV(i,j) = randn+i*5+k;
        end
    end
    XTrain{s}   = HelpXT;
    XVal{s}     = HelpXV;
end
clear k
% Fill Output Data with noisy linear trend
for s=1:numSequences
    for i=1:numResponses
        for j=1:StartTimesteps
            HelpYT(i,j) = randn+i*5;
            HelpYV(i,j) = randn+i*5;
        end
        for j=StartTimesteps:EndTimesteps
            HelpYT(i,j) = randn+i*5+ 0.2*j;
            HelpYV(i,j) = randn+i*5+ 0.2*j;
        end
        for j=EndTimesteps:numTimesteps
            k = 0.2*EndTimesteps;
            HelpYT(i,j) = randn+i*5+k;
            HelpYV(i,j) = randn+i*5+k;
        end
    end
    YTrain{s}   = HelpYT;
    YVal{s}     = HelpYV;
end
clear k
% Normalize the first dataset
[NX_Train{1},SX_Train] = mapminmax(XTrain{1});
[NY_Train{1},SY_Train] = mapminmax(YTrain{1});
% Normalize all remaining datasets using the same options
for i=2:numel(XTrain)
    NX_Train{i} = mapminmax('apply',XTrain{i},SX_Train);
    NY_Train{i} = mapminmax('apply',YTrain{i},SY_Train);
end
for i=1:numel(XVal)
    NX_Val{i} = mapminmax('apply',XVal{i},SX_Train);
    NY_Val{i} = mapminmax('apply',YVal{i},SY_Train);
end
% Define network options:
numHiddenUnits = 3;
miniBatchSize = 1;
% Define network architecture
Layers = [ ...
    sequenceInputLayer(numFeatures)
    lstmLayer(numHiddenUnits,'OutputMode','sequence')
    dropoutLayer(0.5,"Name",'dropout')
    lstmLayer(numHiddenUnits,'OutputMode','sequence')
    dropoutLayer(0.5,"Name",'dropout_2')
    fullyConnectedLayer(numResponses)
    regressionLayer];
% Define training options
maxEpochs                   = 100;
InitialLearnRate            = 1e-2;
Shuffle                     = 'every-epoch';
Plots                       = 'training-progress';
GradientThreshold           = 1;
Verbose                     = 0;
ValidataionData             ={NX_Val, NY_Val};
ValidationFrequency         = 1;
OutputNetwork               = 'best-validation-loss';
L2Regularization            = 0.05;
% Save training options
options = trainingOptions('adam'                                ,...
    'MaxEpochs',maxEpochs                                       ,...
    'MiniBatchSize',miniBatchSize                               ,...
    'InitialLearnRate',InitialLearnRate                         ,...
    'GradientThreshold',GradientThreshold                       ,...
    'Shuffle', Shuffle                                          ,...
    'Plots',Plots                                               ,...
    'Verbose', Verbose                                          ,...
    'ValidationData', ValidataionData                           ,...
    'validationFrequency',ValidationFrequency                   ,...
    'OutputNetwork',OutputNetwork                               ,...
    'L2Regularization',L2Regularization                         );
% Train the network
net = trainNetwork(NX_Train,NY_Train,Layers,options);
% Predict with the network on the validation data
PN_YVal = predict(net,NX_Val,'MiniBatchSize',1);
% initialize renormalized values
A = cell(size(XTrain,1),1); % XTrain
B = cell(size(XTrain,1),1); % YTrain
C = cell(size(XTrain,1),1); % XVal
D = cell(size(XTrain,1),1); % YVal
E = cell(size(XTrain,1),1); % PYVal
% renormalize data
% you can compare elements of A with XTrain, etc., as sanity check
for i=1:size(XTrain,1)
    A{i} = mapminmax('reverse',NX_Train{i},SX_Train);
    B{i} = mapminmax('reverse',NY_Train{i},SY_Train);
    C{i} = mapminmax('reverse',NX_Val{i},SX_Train);
    D{i} = mapminmax('reverse',NY_Val{i},SY_Train);
    E{i} = mapminmax('reverse',PN_YVal{i},SY_Train);
end
% Which sequence to plot
i=1;
% plot validation targets and predictions
tiledlayout(1,numResponses,"TileSpacing","tight");
figure
for j=1:numResponses
    nexttile
    TargetY     =  D{i}(j,:);
    PredictionY = E{i}(j,:);
    plot(TargetY,'k')
    hold on
    plot(PredictionY,'b')
    hold on
    legend('YVal','PYVal')
    hold off
    %Remove axis to fit more plots:
    set(gca,'xtick',[],'ytick',[])
    set(gca,'xtick',[],'ytick',[])
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Neha on 12 Oct 2023

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/2024467-issues-with-lstm-prediction-due-to-normalization-layer-settings#answer_1331449

Open in MATLAB Online

Hi Patrick,

I understand that you are facing issues while normalizing the training data for LSTM and you are not getting the correct predictions when sequenceInputLayer normalization options are used instead of using mapminmax. So when you were using mapminmax, you were scaling both X and Y data but when you tried normalizing at the input layer you must have normalized only the input data.

So you can normalize the output data using mapminmax and then rescale the input data at the seqenceInputLayer.

[NY_Train{1},SY_Train] = mapminmax(YTrain{1});
for i=2:numel(XTrain)
    NY_Train{i} = mapminmax('apply',YTrain{i},SY_Train);
end
for i=1:numel(XVal)
    NY_Val{i} = mapminmax('apply',YVal{i},SY_Train);
end

By specifying max and min, the normalization is analogous to mapminmax, but it's not mandatory, specifying only the type of normalization is sufficient.

sequenceInputLayer(numFeatures, "Normalization","rescale-symmetric", "Max", max(XTrain{1},[],2),"Min", min(XTrain{1},[],2))

In general, it is not necessary to normalize the output data in an LSTM network. The normalization of the output data depends on the specific task and the range of values the output can take. If the output values have a wide range or are continuous, it may be beneficial to apply normalization techniques. This can help in cases where the output values have high variances or are sensitive to scale differences.

Hope this helps!

1 Comment
Show -1 older commentsHide -1 older comments

Patrick Sontheimer on 18 Oct 2023

Hello Neha,

thank you for your answer, it's much appreciated. I think you're right. Additionally, i used several sequences and so far only normalized with the first sequences max and min values as a stopgap measure. From now on i'll normalize all values without the sequencelayer options and i'll calculate the min and max values from all combined sequences both for the input and the output. Scale difference is a concern for the outputs. I've asked around and been told i might appreciate using a custom loss function to adress the issue, as it can evaluate differently sized outputs with different prioritization and removes the need to normalize the output, which makes the RMSE interpretable again (same dimension as output values). I still have issues with the accuracy of my Models, but i think the original question has been adressed.

Sign in to comment.

Issues with LSTM prediction due to normalization Layer settings

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment
Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Issues with LSTM prediction due to normalization Layer settings

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

1 Comment Show -1 older commentsHide -1 older comments

More Answers (0)

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments