`t = classregtree(X,y)`

t = classregtree(X,y,'* Name*',

`value`

`t = classregtree(X,y)`

creates a decision
tree `t`

for predicting the response `y`

as
a function of the predictors in the columns of `X`

. `X`

is
an *n*-by-*m* matrix of predictor
values. If `y`

is a vector of *n* response
values, `classregtree`

performs regression. If `y`

is
a categorical variable, character array, or cell array of strings, `classregtree`

performs
classification. Either way, `t`

is a binary tree
where each branching node is split based on the values of a column
of `X`

. `NaN`

values in `X`

or `y`

are
taken to be missing values. Observations with all missing values for `X`

or
missing values for `y`

are not used in the fit. Observations
with some missing values for `X`

are used to find
splits on variables for which these observations have valid values.

`t = classregtree(X,y,'`

specifies
one or more optional parameter name/value pairs. Specify * Name*',

`value`

`Name`

For all trees:

`categorical`

— Vector of indices of the columns of`X`

that are to be treated as unordered categorical variables`method`

— Either`'classification'`

(default if`y`

is text or a categorical variable) or`'regression'`

(default if`y`

is numeric).`names`

— A cell array of names for the predictor variables, in the order in which they appear in the`X`

from which the tree was created.`prune`

—`'on'`

(default) to compute the full tree and the optimal sequence of pruned subtrees, or`'off'`

for the full tree without pruning.`minparent`

— A number*k*such that impure nodes must have*k*or more observations to be split (default is`10`

).`minleaf`

— A minimal number of observations per tree leaf (default is`1`

). If you supply both`'minparent'`

and`'minleaf'`

,`classregtree`

uses the setting which results in larger leaves:`minparent = max(minparent,2*minleaf)`

`mergeleaves`

—`'on'`

(default) to merge leaves that originate from the same parent node and give the sum of risk values greater or equal to the risk associated with the parent node. If`'off'`

,`classregtree`

does not merge leaves.`nvartosample`

— Number of predictor variables randomly selected for each split. By default all variables are considered for each decision split.`stream`

— Random number stream. Default is the MATLAB default random number stream.`surrogate`

—`'on'`

to find surrogate splits at each branch node. Default is`'off'`

. If you set this parameter to`'on'`

,`classregtree`

can run significantly slower and consume significantly more memory.`weights`

— Vector of observation weights. By default the weight of every observation is 1. The length of this vector must be equal to the number of rows in`X`

.

For regression trees only:

`qetoler`

— Defines tolerance on quadratic error per node for regression trees. Splitting nodes stops when quadratic error per node drops below`qetoler*qed`

, where`qed`

is the quadratic error for the entire data computed before the decision tree is grown:`qed = norm(y-ybar)`

with`ybar`

estimated as the average of the input array`Y`

. Default value is 1e-6.

For classification trees only:

`cost`

— Square matrix`C`

, where`C(i,j)`

is the cost of classifying a point into class`j`

if its true class is`i`

(default has`C(i,j)=1`

if`i~=j`

, and`C(i,j)=0`

if`i=j`

). Alternatively, this value can be a structure`S`

having two fields:`S.group`

containing the group names as a categorical variable, character array, or cell array of strings; and`S.cost`

containing the cost matrix C.`splitcriterion`

— Criterion for choosing a split. One of`'gdi'`

(default) or Gini's diversity index,`'twoing'`

for the twoing rule, or`'deviance'`

for maximum deviance reduction.`priorprob`

— Prior probabilities for each class, specified as a string (`'empirical'`

or`'equal'`

) or as a vector (one value for each distinct group name) or as a structure`S`

with two fields:`S.group`

containing the group names as a categorical variable, character array, or cell array of strings`S.prob`

containing a vector of corresponding probabilities.

If the input value is

`'empirical'`

(default), class probabilities are determined from class frequencies in`Y`

. If the input value is`'equal'`

, all class probabilities are set equal. If both observation weights and class prior probabilities are supplied, the weights are renormalized to add up to the value of the prior probability in the respective class.

[1] Breiman, L., J. Friedman, R. Olshen, and
C. Stone. *Classification and Regression Trees*.
Boca Raton, FL: CRC Press, 1984.

Was this topic helpful?