- Shuffle the dataset: Before creating the neural network and specifying the trainInd in scenario 2, shuffle the entire dataset randomly. This will help to randomize the order of data samples and potentially lead to more consistent training.
- Set the random seed: If you are using a random number generator during training (e.g., weight initialization or mini-batch shuffling), set a fixed random seed before running both scenarios. This ensures that the randomization process during training is the same for both scenarios, leading to more reproducible results.
Different training results for neural network when using full dataset versus partial dataset
4 views (last 30 days)
Show older comments
Katy
on 20 Jul 2023
Commented: Mrutyunjaya Hiremath
on 24 Jul 2023
I'm training a network using 'narxnet' and 'train'.
My training data is a part of a larger dataset. These are the two scenarios in which I get different results.
- Trim the dataset so the entire input data is the training data. 'trainInd' = the entire dataset; no validation or test indices are provided
- Use the entire dataset, but specify the training data by 'trainInd' (using the indices of the exact data from scenario 1); no validation or test indices are provided
The training terminates at the same conditions, and I'm using the same dataset, but I get different results. I've also experimented with adjusting the training data indices in scenario 2 based on # of delays specified with no luck.
Does anyone have any insight ino what might be causing this? (I'm aware with the issues of not specifying validation data, I'm just trying to replaicate behavior at the moment).
0 Comments
Accepted Answer
Mrutyunjaya Hiremath
on 21 Jul 2023
The difference in results between scenario 1 and scenario 2 could be due to the different order of data samples seen during training. When you trim the dataset so that the entire input data is used for training (scenario 1), the network sees the data in the same order as it appears in the dataset. However, when you specify the training data using the indices (scenario 2), the network sees the data in a different order based on the selected indices.
In a neural network, the order in which data samples are presented during training can have an impact on the convergence and final performance of the model. Different orders of data samples can lead to different weight updates during training, potentially resulting in slightly different results.
To address this issue and ensure more consistent results, you can try the following:
By shuffling the dataset and setting the random seed, you should get more consistent results between scenario 1 and scenario 2. Keep in mind that neural networks are still sensitive to other factors such as network architecture, learning rate, and training parameters, so it's possible to see slight differences even with these measures in place. However, the consistency should be improved.
4 Comments
More Answers (0)
See Also
Categories
Find more on Sequence and Numeric Feature Data Workflows in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!