Custom mini-batches when using 'trainnetwork' - Accuracy/Loss much noisier

18 views (last 30 days)
Hi,
I'm using the "trainnetwork" function to train a neural net on numeric data for classification. However, I have to input custom mini-batches rather than give it my entire dataset matrix as it is too big and will crash out. I have code that will randomly select a unique mini-batch from my data set. I set the parameters of the trainnetwork so that the 'MiniBatchSize' parameter is actually the size of the mini-batch matrix I'm feeding it and the 'MaxEpochs' to 1 so that it just runs one batch one time only and moves on. Therefore, I re-train my net on each batch in an iterative loop. I have the idea coded below.
The get_MiniBatch function below is only for illustrative purposes and the last column of miniBatch are the labels.
for epochIdx = 1 : maxNumEpochs
for miniBatchIdx = 1 : NumMiniBatches
miniBatch = get_MiniBatch(DATA);
options = trainingOptions('adam', 'MiniBatchSize', size(miniBatch,1), 'MaxEpochs', 1, 'Verbose', 0);
[Net, trainingMetrics] = trainNetwork(miniBatch(:,1:end-1), categorical(miniBatch(:,end)), layers, options);
layers = Net.Layers;
end
end
However, I noticed that when I plot the trainingMetrics.TrainingLoss and trainingMetrics.TrainingAccuracy across all mini-batches, they are much noisier compared to if I give the "trainnetwork" function all my data and allow it to run through mini-bacthes automatically and plot the in-built progress plots showing Training Accuracy and Loss (unsmoothed). Am I updating the weights etc. correctly just by assigning the layers to be updated at the end of my loop each time as I have here? Or should I also be updating other paramters?
I'm sure it would be much easier just to give "trainnetwork" all the data and allow it to do everything but I have to do the mini-batches in loops to reduce computational cost for now.
Thanks.

Accepted Answer

Nomit Jangid
Nomit Jangid on 3 Dec 2020
Hi Terence,
This is the expected behavior. Training on a single batch with all the data gives a smoother curve. It is not required to train on one batch as mini-batches can also train the network very effectively. Some advantages of using minibatch as opposed to the full dataset are that using the full dataset is more computationally expensive and the gradient trajectory can land you in a saddle point.
You can check the following links to know more about this.
  1 Comment
Terence
Terence on 3 Dec 2020
Hi Nomit,
Thanks for your reply. I actually am only training on mini-batches. Instead of me giving the trainnetwork function all my data and allowing it to do the mini-batch work for me, I am arranging the mini-bacthes first myself and then feeding the mini-batch one at a time into trainnetwork. On each iteration of the loop I feed it a new mini-batch. So in a sesne, trainnework thinks I'm giving it all the data each time but I'm actually only giving it a different mini-batch at each iteration.
Although I update the layers after every iteration. Something seems incorrect with the training accuracy as the iterations continue. The accuracy and loss begin to look quite erratic. So I guess trainnetwork is treating each mini-bacth as completely new data and starting from scratch for each of my mini-batches?
Thanks,
Terence

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!