Documentation Center

  • Trials
  • Product Updates

classregtree

Class: classregtree

Construct classification and regression trees

Syntax

t = classregtree(X,y)
t = classregtree(X,y,'Name',value)

Description

    Note:   This function is superseded by fitctree and fitrtree of ClassificationTree and RegressionTree classes. It is maintained only for backwards compatibility.

t = classregtree(X,y) creates a decision tree t for predicting the response y as a function of the predictors in the columns of X. X is an n-by-m matrix of predictor values. If y is a vector of n response values, classregtree performs regression. If y is a categorical variable, character array, or cell array of strings, classregtree performs classification. Either way, t is a binary tree where each branching node is split based on the values of a column of X. NaN values in X or y are taken to be missing values. Observations with all missing values for X or missing values for y are not used in the fit. Observations with some missing values for X are used to find splits on variables for which these observations have valid values.

t = classregtree(X,y,'Name',value) specifies one or more optional parameter name/value pairs. Specify Name in single quotes. The following options are available:

For all trees:

  • categorical — Vector of indices of the columns of X that are to be treated as unordered categorical variables

  • method — Either 'classification' (default if y is text or a categorical variable) or 'regression' (default if y is numeric).

  • names — A cell array of names for the predictor variables, in the order in which they appear in the X from which the tree was created.

  • prune'on' (default) to compute the full tree and the optimal sequence of pruned subtrees, or 'off' for the full tree without pruning.

  • minparent — A number k such that impure nodes must have k or more observations to be split (default is 10).

  • minleaf — A minimal number of observations per tree leaf (default is 1). If you supply both 'minparent' and 'minleaf', classregtree uses the setting which results in larger leaves: minparent = max(minparent,2*minleaf)

  • mergeleaves'on' (default) to merge leaves that originate from the same parent node and give the sum of risk values greater or equal to the risk associated with the parent node. If 'off', classregtree does not merge leaves.

  • nvartosample — Number of predictor variables randomly selected for each split. By default all variables are considered for each decision split.

  • stream — Random number stream. Default is the MATLAB default random number stream.

  • surrogate'on' to find surrogate splits at each branch node. Default is 'off'. If you set this parameter to 'on',classregtree can run significantly slower and consume significantly more memory.

  • weights — Vector of observation weights. By default the weight of every observation is 1. The length of this vector must be equal to the number of rows in X.

For regression trees only:

  • qetoler — Defines tolerance on quadratic error per node for regression trees. Splitting nodes stops when quadratic error per node drops below qetoler*qed, where qed is the quadratic error for the entire data computed before the decision tree is grown: qed = norm(y-ybar) with ybar estimated as the average of the input array Y. Default value is 1e-6.

For classification trees only:

  • cost — Square matrix C, where C(i,j) is the cost of classifying a point into class j if its true class is i (default has C(i,j)=1 if i~=j, and C(i,j)=0 if i=j). Alternatively, this value can be a structure S having two fields: S.group containing the group names as a categorical variable, character array, or cell array of strings; and S.cost containing the cost matrix C.

  • splitcriterion — Criterion for choosing a split. One of 'gdi' (default) or Gini's diversity index, 'twoing' for the twoing rule, or 'deviance' for maximum deviance reduction.

  • priorprob — Prior probabilities for each class, specified as a string ('empirical' or 'equal') or as a vector (one value for each distinct group name) or as a structure S with two fields:

    • S.group containing the group names as a categorical variable, character array, or cell array of strings

    • S.prob containing a vector of corresponding probabilities.

    If the input value is 'empirical' (default), class probabilities are determined from class frequencies in Y. If the input value is 'equal', all class probabilities are set equal. If both observation weights and class prior probabilities are supplied, the weights are renormalized to add up to the value of the prior probability in the respective class.

Examples

expand all

Plot a Classification Tree

Create a classification decision tree for Fisher's iris data:

load fisheriris;
t = classregtree(meas,species,...
                 'names',{'SL' 'SW' 'PL' 'PW'})
view(t)
t = 

Decision tree for classification
1  if PL<2.45 then node 2 elseif PL>=2.45 then node 3 else setosa
2  class = setosa
3  if PW<1.75 then node 4 elseif PW>=1.75 then node 5 else versicolor
4  if PL<4.95 then node 6 elseif PL>=4.95 then node 7 else versicolor
5  class = virginica
6  if PW<1.65 then node 8 elseif PW>=1.65 then node 9 else versicolor
7  class = virginica
8  class = versicolor
9  class = virginica

References

[1] Breiman, L., J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Boca Raton, FL: CRC Press, 1984.

See Also

| | | | |

How To

Was this topic helpful?