How to split a dataset in 3 sets using splitEachLabel using percentage such that each class appears in all 3 sets?
4 views (last 30 days)
Show older comments
I've an image dataset with around 100 classes and the maximum number of images for one class is 59 whereas the minimum is 5. I try to split the data into training, validation and testing by using the following statement
[imdsTrain,imdsValidation, imdsTest] = splitEachLabel(imds,0.75,0.15,'randomize');
I got the error that training and validation data must have same labels.
I checked the imds and found that for classes having less number of images like 5, it puts 4 in training and 1 sometimes either in validation set and some in test data set. So all classes that are in training are not found in validation or test data set.
I solved it by increaing the validation percent to 0.2 instead of 0.15 but it doesn't seem a good solution.
Is there a way to split the dataset such that all classes are present in all 3 datasets? Preferably I want to make it using percentages and don't want to use integer such that it puts always 1 image in validation and test dataset.
0 Comments
Answers (1)
Anmol Dhiman
on 3 Jul 2020
Edited: Anmol Dhiman
on 3 Jul 2020
Hi Faisal,
The second arguement (0.75) in splitEachLabel is proportion representing proportion of files to split, specified as a scalar in the interval (0,1) or a positive integer scalar. You can change its value for your problem.
Regards,
Anmol Dhiman
See Also
Categories
Find more on Datastore in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!