Clear Filters
Clear Filters

Info

This question is closed. Reopen it to edit or answer.

Neural network: train() behavior with earlier results

2 views (last 30 days)
Akshay Joshi
Akshay Joshi on 28 Jan 2018
Closed: MATLAB Answer Bot on 20 Aug 2021
I have a very large dataset of around 150GB that I need to process using neural networks. As this data is quite big, I've to break it into chunks, say 5000 elements are sent as 20 batches, each batch containing 250 elements. The following dummy code can be written for this:
for count = 1:num_batches
inputs = entire_input(1 + (count-1)*num_batches, count * num_batches);
targets = entire_targets(1 + (count-1)*num_batches, count * num_batches);
net = train(net, inputs, targets);
end
Will the net again start training with the fresh batch, or will it be able to retain weights calculated for previous batch? As per some of my discussions and findings, with each new batch, the weights start taking shape of current data and may overwrite previous weights.
Please advise if this method works well, or we can use some other method instead of train().

Answers (1)

Greg Heath
Greg Heath on 29 Jan 2018
"Need to process" doesn't provide useful information.
What are you trying to design? Curvefitter/Regressor? PatternRecognizer/Unsupervised-Classifier/Supervised-Classifier? Timeseries??
In all cases, training, validation and test data should have similar summary statistics in all run batches. Otherwise training batch n will erase some of what is learned in batches 1 to n-1.
Your response should be far less vague than your original explanation.
Hope this helps.
Greg
Thank you for formally accepting my answer
  1 Comment
Akshay Joshi
Akshay Joshi on 29 Jan 2018
Edited: Akshay Joshi on 29 Jan 2018
Hi Greg,
I'm trying to design a supervised classifier with the help of multi layer perceptrons ( feedforwardnet). The input matrix is of 500,000 x 25 dimension, and output matrix 5,000 x 25.
Initially, I tried to train my network using nntool. But I was unable to feed dataset this large (150 GB) into it due to memory constraints, so decided to break data into chunks. For this purpose, I'm writing a Matlab script to create neural network and provide input in chunks.
In all cases, training, validation and test data should have similar summary statistics in all run batches.
Otherwise training batch n will erase some of what is learned in batches 1 to n-1.
Can you suggest some method through which we can retain the data of 1 to n-1 batches, and based on that, we calculate the result of say n to n+k batches?
Thanks for the earlier response. Hope I'm clear this time.

This question is closed.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!