MATLAB Answers

How Matlab Classification Learner calculate a model accuracy

14 views (last 30 days)
Hi, here is my question: I trained a series of classifiers using Matlab Classification Learner and then I tried to replicate the same results writing my own code. I'm pretty sure I set all the parameters exactly the same way and used the same functions. I checked by generating code from the Classification Learner and making sure that my code had the same parameters. Now the thing is every time I train a model with the same data set, Matlab Classification Learner gives me the same accuracy, while when I do the same using my own code, the accuracy values change. This is due to the randomness associated with the cross validation step. So I assume that Matlab Classification Learner is running the cross validation step multiple time and then presenting the average accuracy of the cross validation. Am I right? Interestingly enough, if I use the code generated through the GUI and run it several times, even there the accuracy changes every time. So my question is how does the GUI get a constant accuracy? How many times does it average the cross validation results, if that's the case?
Thanks,

  0 Comments

Sign in to comment.

Accepted Answer

Luuk van Oosten
Luuk van Oosten on 23 Sep 2016
Dear Alessandro,
Reproducible results can be obtained if one uses a SubStream and RandStreams while building a classifier. I suspect that the classificationLearner does incorporate a fixed number stream, but is not incorporated in the GUI.
See for example in the documentation about TreeBagger how you can obtain reproducible results; indicated under Description> 'Options'.

  3 Comments

Alessandro Napoli
Alessandro Napoli on 23 Sep 2016
Hi Luuk,
thanks for your reply. It makes a lot of sense. I have used the rng function for that very same purpose in the past. I just didn't think that they would follow that approach to get cross validation accuracy.
Thanks again.
Alessandro Napoli
Alessandro Napoli on 23 Sep 2016
Just for completeness, when running PCA analysis for feature reduction before running the classifier, the randomness affects the accuracy difference even more.
Lydia Ashton
Lydia Ashton on 15 Dec 2016
Do you know of any documentation on how to control this randomness for classification models using logistic regression? Thanks!

Sign in to comment.

More Answers (0)

Sign in to answer this question.