Why data is discarded in shuffle operation when training a deep network?

Question

Hana Ahmed on 30 May 2022

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1730315-why-data-is-discarded-in-shuffle-operation-when-training-a-deep-network

Answered: Aravind on 30 Jan 2025

When training a deep learning network, if the batch size does not evenly divide the number of training samples, then the training data that does not fit into the final batch of each epoch is discarded. Why this limitation? why part of the training data is discarded?

Setting the shuffle training option to "every-epoch" does not prevent discarding data, it just avoid discarding the same data every epoch.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

Aravind on 30 Jan 2025

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1730315-why-data-is-discarded-in-shuffle-operation-when-training-a-deep-network#answer_1558577

Hi @Hana Ahmed,

When training deep learning networks in MATLAB, if the batch size does not evenly divide the number of training samples, the leftover data that cannot fill a complete batch at the end of each epoch will be discarded. This is explained in the documentation here: https://www.mathworks.com/help/releases/R2022a/deeplearning/ref/trainingoptions.html#d123e146068. Under the “Shuffle” option, it is recommended to set the value to “every-epoch” to avoid discarding the same data each time.

Here are some reasons for this behavior:

Batch Processing Consistency: MATLAB's deep learning framework, similar to others, is optimized for processing batches of a consistent size, which enhances computational efficiency and fully utilizes parallel processing capabilities, particularly on GPUs.
Gradient Estimation Stability: Inconsistent, smaller batch sizes can lead to higher variance in gradient estimates, which can destabilize the convergence process during training and potentially result in less reliable learning outcomes.

This approach balances computational efficiency with the use of all available data during training.

To ensure no data is discarded, you can use a custom training loop. You can define a “minibatchqueue” object with your input data to create mini-batches. By setting the “PartialMiniBatch” option to “return”, you ensure that even if the number of observations is not divisible by the mini-batch size, no data is lost, as the final mini-batch will contain fewer observations. You can find more information about this here. You can also refer to this example on how to train a network using a custom training loop: https://www.mathworks.com/help/releases/R2022a/deeplearning/ug/train-network-using-custom-training-loop.html.

I hope this answers your question.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Why data is discarded in shuffle operation when training a deep network?

0 Comments
Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

Why data is discarded in shuffle operation when training a deep network?

0 Comments Show -2 older commentsHide -2 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Release

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments