What is the difference between oobPredict and predict with ensemble of bagged decision trees?

5 views (last 30 days)
1- I am using both fuctions to predict a response through random forest, but the predict function gives higher percentage of explained variance compared to oobPredict. Why is it so? - I think there is some fundamental thing that I have not yet fully grasped.
2- If there is something different between these methods in the way that they weigh trees how can I make these methods homogenous?
3- Can one use oobPredict in someway to make predictions with a new set of data?

Accepted Answer

Malay Agarwal
Malay Agarwal on 26 Aug 2024
Edited: Malay Agarwal on 26 Aug 2024
The "oobPredict" function is used to get a more realistic estimate of the performance of the model. For each data sample, the function only considers those trees for which the sample was out-of-bag during training. In other words, it only considers those trees which have not seen the sample during training. Since the trees have not seen the sample, the prediction can be incorrect and contribute to the model's error. This can lead to a lower percentage of explained variance.
On the other hand, the "predict" function uses all the trees to obtain a prediction for a sample. If the sample is from the training set, at least one tree must have seen the sample during training and the model can account for more of the variance in the dataset.
This is similar to having a training set and a validation set when training a neural network (https://en.wikipedia.org/wiki/Training,_validation,_and_test_data_sets).The network will always report a higher error and explain less of the variance on the validation set since the model is not explicitly trained on those samples. The out-of-bag samples act as the validation set since only those trees which haven't seen the sample during training have a say in the final prediction.
This is explained in the documentation of "oobPredict" (https://www.mathworks.com/help/stats/treebagger.oobpredict.html#bu0qyz1-2), albeit in a less direct manner:
"For each observation that is out of bag for at least one tree, oobPredict composes the weighted mean of the class posterior probabilities by selecting the trees in which the observation is out of bag. "
I don't think there is any way to make the outputs more homogenous since "oobPredict" will always choose a different set of trees to make a prediction for a sample as compared to the "predict" function. You can try experimenting with the "TreeWeights" name-value argument but I think that's unlikely to work since it only defines how to weigh the trees in the overall calculation of the prediction, and does not affect which trees will take part in the prediction.
Coming to your last question, the "oobPredict" function does not support making predictions on new data. It is simply to evaluate the model's performance by obtaining a less biased estimate of its error. For new data, please use the "predict" function.
Hope this helps!

More Answers (0)

Categories

Find more on Get Started with Statistics and Machine Learning Toolbox in Help Center and File Exchange

Products


Release

R2023a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!