How to include all variables in each decision tree of an ensemble?
3 views (last 30 days)
Show older comments
Hi everyone. I am fitting the following 10-tree ensemble.
X = rand(1000,50);
Y = rand(1000,1);
N = size(X,2);
Ntrees=10;
t = templateTree('NumVariablesToSample','all');
Mdl = fitrensemble(X,Y,'Method','LSBoost','Learners',t,'NumLearningCycles',Ntrees);
Below I extract the number of variables that are included in each of the 10 trees.
z = false(N,Ntrees);
for i = 1:Ntrees
idx = unique(Mdl.Trained{i}.CutPredictorIndex);
idx(idx==0)=[];
z(idx,i) = 1;
end
sum(z)
>> ans =
8 10 8 10 9 9 10 8 9 9
Despite setting 'NumVariablesToSample’ to ‘all’, when I extract the variables included in each tree, only 8-10 out of the 50 features are included in each tree. Does anyone have a suggestion on how to force all variables to be included in all trees? Thanks.
0 Comments
Answers (1)
Aditya Patil
on 16 Feb 2021
'NumVariablesToSample' defines the number of variables(predictors) which will be considered at any given split. The decision tree algorithm picks random set of predictors, and then selects one of them, based on certain criterias.
It might not be necessary, or sometimes even possible, to use a specific variable in a tree. For example, consider if a prior split leaves samples of only one class. In such a case, selecting a decision boundary for that variable will not be possible.
If you need to use all variables, you can look at some of the other classification algorithms available in MATLAB, such as SVM.
0 Comments
See Also
Categories
Find more on Classification Ensembles in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!