# CompactClassificationTree

Package: classreg.learning.classif

Compact classification tree

## Description

Compact version of a classification tree (of class `ClassificationTree`). The compact version does not include the data for training the classification tree. Therefore, you cannot perform some tasks with a compact classification tree, such as cross validation. Use a compact classification tree for making predictions (classifications) of new data.

## Construction

```ctree = compact(tree)``` constructs a compact decision tree from a full decision tree.

### Input Arguments

 `tree` A decision tree constructed using `fitctree`.

## Properties

 `CategoricalPredictors` Categorical predictor indices, specified as a vector of positive integers. `CategoricalPredictors` contains index values indicating that the corresponding predictors are categorical. The index values are between 1 and `p`, where `p` is the number of predictors used to train the model. If none of the predictors are categorical, then this property is empty (`[]`). `CategoricalSplit` An n-by-2 cell array, where `n` is the number of categorical splits in `tree`. Each row in `CategoricalSplit` gives left and right values for a categorical split. For each branch node with categorical split `j` based on a categorical predictor variable `z`, the left child is chosen if `z` is in `CategoricalSplit(j,1)` and the right child is chosen if `z` is in `CategoricalSplit(j,2)`. The splits are in the same order as nodes of the tree. Nodes for these splits can be found by running `cuttype` and selecting `'categorical'` cuts from top to bottom. `Children` An n-by-2 array containing the numbers of the child nodes for each node in `tree`, where n is the number of nodes. Leaf nodes have child node `0`. `ClassCount` An n-by-k array of class counts for the nodes in `tree`, where n is the number of nodes and k is the number of classes. For any node number `i`, the class counts `ClassCount(i,:)` are counts of observations (from the data used in fitting the tree) from each class satisfying the conditions for node `i`. `ClassNames` List of the elements in `Y` with duplicates removed. `ClassNames` can be a numeric vector, vector of categorical variables, logical vector, character array, or cell array of character vectors. `ClassNames` has the same data type as the data in the argument `Y`. (The software treats string arrays as cell arrays of character vectors.) If the value of a property has at least one dimension of length k, then `ClassNames` indicates the order of the elements along that dimension (e.g., `Cost` and `Prior`). `ClassProbability` An n-by-k array of class probabilities for the nodes in `tree`, where n is the number of nodes and k is the number of classes. For any node number `i`, the class probabilities `ClassProbability(i,:)` are the estimated probabilities for each class for a point satisfying the conditions for node `i`. `Cost` Square matrix, where `Cost(i,j)` is the cost of classifying a point into class `j` if its true class is `i` (the rows correspond to the true class and the columns correspond to the predicted class). The order of the rows and columns of `Cost` corresponds to the order of the classes in `ClassNames`. The number of rows and columns in `Cost` is the number of unique classes in the response. This property is read-only. `CutCategories` An n-by-2 cell array of the categories used at branches in `tree`, where n is the number of nodes. For each branch node `i` based on a categorical predictor variable `x`, the left child is chosen if `x` is among the categories listed in `CutCategories{i,1}`, and the right child is chosen if `x` is among those listed in `CutCategories{i,2}`. Both columns of `CutCategories` are empty for branch nodes based on continuous predictors and for leaf nodes. `CutPoint` contains the cut points for `'continuous'` cuts, and `CutCategories` contains the set of categories. `CutPoint` An n-element vector of the values used as cut points in `tree`, where n is the number of nodes. For each branch node `i` based on a continuous predictor variable `x`, the left child is chosen if `x=CutPoint(i)`. `CutPoint` is `NaN` for branch nodes based on categorical predictors and for leaf nodes. `CutPoint` contains the cut points for `'continuous'` cuts, and `CutCategories` contains the set of categories. `CutType` An n-element cell array indicating the type of cut at each node in `tree`, where n is the number of nodes. For each node `i`, `CutType{i}` is: `'continuous'` — If the cut is defined in the form `x < v` for a variable `x` and cut point `v`.`'categorical'` — If the cut is defined by whether a variable `x` takes a value in a set of categories.`''` — If `i` is a leaf node. `CutPoint` contains the cut points for `'continuous'` cuts, and `CutCategories` contains the set of categories. `CutPredictor` An n-element cell array of the names of the variables used for branching in each node in `tree`, where n is the number of nodes. These variables are sometimes known as cut variables. For leaf nodes, `CutPredictor` contains an empty character vector. `CutPoint` contains the cut points for `'continuous'` cuts, and `CutCategories` contains the set of categories. `CutPredictorIndex` An n-element array of numeric indices for the variables used for branching in each node in `tree`, where n is the number of nodes. For more information, see `CutPredictor`. `ExpandedPredictorNames` Expanded predictor names, stored as a cell array of character vectors. If the model uses encoding for categorical variables, then `ExpandedPredictorNames` includes the names that describe the expanded variables. Otherwise, `ExpandedPredictorNames` is the same as `PredictorNames`. `IsBranchNode` An n-element logical vector that is `true` for each branch node and `false` for each leaf node of `tree`. `NodeClass` An n-element cell array with the names of the most probable classes in each node of `tree`, where n is the number of nodes in the tree. Every element of this array is a character vector equal to one of the class names in `ClassNames`. `NodeError` An n-element vector of the errors of the nodes in `tree`, where n is the number of nodes. `NodeError(i)` is the misclassification probability for node `i`. `NodeProbability` An n-element vector of the probabilities of the nodes in `tree`, where n is the number of nodes. The probability of a node is computed as the proportion of observations from the original data that satisfy the conditions for the node. This proportion is adjusted for any prior probabilities assigned to each class. `NodeRisk` An n-element vector of the risk of the nodes in the tree, where n is the number of nodes. The risk for each node is the measure of impurity (Gini index or deviance) for this node weighted by the node probability. If the tree is grown by twoing, the risk for each node is zero. `NodeSize` An n-element vector of the sizes of the nodes in `tree`, where n is the number of nodes. The size of a node is defined as the number of observations from the data used to create the tree that satisfy the conditions for the node. `NumNodes` The number of nodes in `tree`. `Parent` An n-element vector containing the number of the parent node for each node in `tree`, where n is the number of nodes. The parent of the root node is `0`. `PredictorNames ` A cell array of names for the predictor variables, in the order in which they appear in `X`. `Prior` Numeric vector of prior probabilities for each class. The order of the elements of `Prior` corresponds to the order of the classes in `ClassNames`. The number of elements of `Prior` is the number of unique classes in the response. This property is read-only. `PruneAlpha` Numeric vector with one element per pruning level. If the pruning level ranges from 0 to M, then `PruneAlpha` has M + 1 elements sorted in ascending order. `PruneAlpha(1)` is for pruning level 0 (no pruning), `PruneAlpha(2)` is for pruning level 1, and so on. `PruneList` An n-element numeric vector with the pruning levels in each node of `tree`, where n is the number of nodes. The pruning levels range from 0 (no pruning) to M, where M is the distance between the deepest leaf and the root node. `ResponseName` Character vector describing the response variable `Y`. `ScoreTransform` Function handle for transforming scores, or character vector representing a built-in transformation function. `'none'` means no transformation; equivalently, `'none'` means `@(x)x`. For a list of built-in transformation functions and the syntax of custom transformation functions, see `fitctree`. Add or change a `ScoreTransform` function using dot notation: ```ctree.ScoreTransform = 'function' or ctree.ScoreTransform = @function``` `SurrogateCutCategories` An n-element cell array of the categories used for surrogate splits in `tree`, where n is the number of nodes in `tree`. For each node `k`, `SurrogateCutCategories{k}` is a cell array. The length of `SurrogateCutCategories{k}` is equal to the number of surrogate predictors found at this node. Every element of `SurrogateCutCategories{k}` is either an empty character vector for a continuous surrogate predictor, or is a two-element cell array with categories for a categorical surrogate predictor. The first element of this two-element cell array lists categories assigned to the left child by this surrogate split and the second element of this two-element cell array lists categories assigned to the right child by this surrogate split. The order of the surrogate split variables at each node is matched to the order of variables in `SurrogateCutVar`. The optimal-split variable at this node does not appear. For nonbranch (leaf) nodes, `SurrogateCutCategories` contains an empty cell. `SurrogateCutFlip` An n-element cell array of the numeric cut assignments used for surrogate splits in `tree`, where n is the number of nodes in `tree`. For each node `k`, `SurrSurrogateCutFlip{k}` is a numeric vector. The length of `SurrogateCutFlip{k}` is equal to the number of surrogate predictors found at this node. Every element of `SurrogateCutFlip{k}` is either zero for a categorical surrogate predictor, or a numeric cut assignment for a continuous surrogate predictor. The numeric cut assignment can be either –1 or +1. For every surrogate split with a numeric cut C based on a continuous predictor variable Z, the left child is chosen if Z

## Object Functions

 `compareHoldout` Compare accuracies of two classification models using new data `edge` Classification edge `gather` Gather properties of Statistics and Machine Learning Toolbox object from GPU `lime` Local interpretable model-agnostic explanations (LIME) `loss` Classification error `margin` Classification margins `partialDependence` Compute partial dependence `plotPartialDependence` Create partial dependence plot (PDP) and individual conditional expectation (ICE) plots `predict` Predict labels using classification tree `predictorImportance` Estimates of predictor importance for classification tree `shapley` Shapley values `surrogateAssociation` Mean predictive measure of association for surrogate splits in classification tree `update` Update model parameters for code generation `view` View classification tree

## Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects.

## Examples

collapse all

Construct a compact classification tree for the Fisher iris data.

```load fisheriris tree = fitctree(meas,species); ctree = compact(tree);```

Compare the size of the resulting tree to that of the original tree.

```t = whos('tree'); % t.bytes = size of tree in bytes c = whos('ctree'); % c.bytes = size of ctree in bytes [c.bytes t.bytes]```
```ans = 1×2 5097 11762 ```

The compact tree is smaller than the original tree.

expand all