Classification error by cross validation
cross-validated classification error (loss) for
E = cvloss(
classification tree. The
cvloss method uses stratified partitioning
to create cross-validated sets. That is, for each fold, each partition of the data
has roughly the same class proportions as in the data used to train
comma-separated pairs of
the argument name and
Value is the corresponding value.
Name must appear inside quotes. You can specify several name and value
pair arguments in any order as
'TreeSize'— Tree size
Tree size, specified as the comma-separated pair consisting of
and one of the following values:
the smallest tree whose cost is within one standard error of the minimum
the minimal cost tree.
'KFold'— Number of cross-validation samples
Number of cross-validation samples, specified as the comma-separated pair consisting of KFold and a positive integer value greater than 1.
E— Cross-validation classification error
Cross-validation classification error (loss), returned as a
vector or scalar depending on the setting of the
SE— Standard error
Standard error of
E, returned as a vector
or scalar depending on the setting of the
Nleaf— Number of leaf nodes
Number of leaf nodes in
as a vector or scalar depending on the setting of the
pair. Leaf nodes are terminal nodes, which give classifications, not
BestLevel— Best pruning level
Best pruning level, returned as a scalar value. By default,
a scalar representing the largest pruning level that achieves a value
SE of the minimum
error. If you set
the smallest value in
Compute the cross-validation error for a default classification tree.
ionosphere data set.
Grow a classification tree using the entire data set.
Mdl = fitctree(X,Y);
Compute the cross-validation error.
rng(1); % For reproducibility E = cvloss(Mdl)
E = 0.1168
E is the 10-fold misclassification error.
Apply k-fold cross validation to find the best level to prune a classification tree for all of its subtrees.
ionosphere data set.
Grow a classification tree using the entire data set. View the resulting tree.
Mdl = fitctree(X,Y); view(Mdl,'Mode','graph')
Compute the 5-fold cross-validation error for each subtree except for the highest pruning level. Specify to return the best pruning level over all subtrees.
rng(1); % For reproducibility m = max(Mdl.PruneList) - 1
m = 7
[E,~,~,bestLevel] = cvloss(Mdl,'SubTrees',0:m,'KFold',5)
E = 8×1 0.1282 0.1254 0.1225 0.1282 0.1282 0.1197 0.0997 0.1738
bestLevel = 6
7 pruning levels, the best pruning level is
Prune the tree to the best level. View the resulting tree.
MdlPrune = prune(Mdl,'Level',bestLevel); view(MdlPrune,'Mode','graph')
You can construct a cross-validated tree model with
kfoldLoss instead of
If you are going to examine the cross-validated tree more than once,
then the alternative can save time.
does not allow you to examine any error other than the classification