Main Content

RegressionPartitionedModel

Package: classreg.learning.partition

Cross-validated regression model

Description

RegressionPartitionedModel is a set of regression models trained on cross-validated folds. Estimate the quality of regression by cross validation using one or more “kfold” methods: kfoldPredict, kfoldLoss, and kfoldfun. Every “kfold” method uses models trained on in-fold observations to predict response for out-of-fold observations. For example, suppose you cross validate using five folds. In this case, every training fold contains roughly 4/5 of the data and every test fold contains roughly 1/5 of the data. The first model stored in Trained{1} was trained on X and Y with the first 1/5 excluded, the second model stored in Trained{2} was trained on X and Y with the second 1/5 excluded, and so on. When you call kfoldPredict, it computes predictions for the first 1/5 of the data using the first model, for the second 1/5 of data using the second model and so on. In short, response for every observation is computed by kfoldPredict using the model trained without this observation.

Construction

CVMdl = crossval(Mdl) creates a cross-validated regression model from a regression model (Mdl).

Alternatively:

  • CVNetMdl = fitrnet(X,Y,Name,Value)

  • CVTreeMdl = fitrtree(X,Y,Name,Value)

Create a cross-validated model when Name is 'CrossVal', 'KFold', 'Holdout', 'Leaveout', or 'CVPartition'. For syntax details, see fitrnet and fitrtree.

Input Arguments

Mdl

A regression model, specified as one of the following:

  • A neural network regression model trained using fitrnet

  • A regression tree trained using fitrtree

Properties

BinEdges

Bin edges for numeric predictors, specified as a cell array of p numeric vectors, where p is the number of predictors. Each vector includes the bin edges for a numeric predictor. The element in the cell array for a categorical predictor is empty because the software does not bin categorical predictors.

The software bins numeric predictors only if you specify the 'NumBins' name-value argument as a positive integer scalar when training a model with tree learners. The BinEdges property is empty if the 'NumBins' value is empty (default).

You can reproduce the binned predictor data Xbinned by using the BinEdges property of the trained model mdl.

X = mdl.X; % Predictor data
Xbinned = zeros(size(X));
edges = mdl.BinEdges;
% Find indices of binned predictors.
idxNumeric = find(~cellfun(@isempty,edges));
if iscolumn(idxNumeric)
    idxNumeric = idxNumeric';
end
for j = idxNumeric 
    x = X(:,j);
    % Convert x to array if x is a table.
    if istable(x) 
        x = table2array(x);
    end
    % Group x into bins by using the discretize function.
    xbinned = discretize(x,[-inf; edges{j}; inf]); 
    Xbinned(:,j) = xbinned;
end
Xbinned contains the bin indices, ranging from 1 to the number of bins, for numeric predictors. Xbinned values are 0 for categorical predictors. If X contains NaNs, then the corresponding Xbinned values are NaNs.

CategoricalPredictors

Categorical predictor indices, specified as a vector of positive integers. Assuming that the predictor data contains observations in rows, CategoricalPredictors contains index values corresponding to the columns of the predictor data that contain categorical predictors. If none of the predictors are categorical, then this property is empty ([]).

CrossValidatedModel

Name of the cross-validated model, a character vector.

Kfold

Number of folds used in the cross-validated model, a positive integer.

ModelParameters

Object holding parameters of Mdl.

NumObservations

Number of observations in the training data stored in X and Y, specified as a numeric scalar.

Partition

The partition of class cvpartition used in the cross-validated model.

PredictorNames

A cell array of names for the predictor variables, in the order in which they appear in X.

ResponseName

Name of the response variable Y, a character vector.

ResponseTransform

Function handle for transforming the raw response values (mean squared error). The function handle should accept a matrix of response values and return a matrix of the same size. The default character vector 'none' means @(x)x, or no transformation.

Add or change a ResponseTransform function using dot notation:

CVMdl.ResponseTransform = @function

Trained

The trained learners, a cell array of compact regression models.

W

The scaled weights, a vector with length n, the number of observations in X.

X

A matrix or table of predictor values.

Y

A numeric column vector. Each entry in Y is the response value of the corresponding observation in X.

Object Functions

gatherGather properties of Statistics and Machine Learning Toolbox object from GPU
kfoldLossLoss for cross-validated partitioned regression model
kfoldPredictPredict responses for observations in cross-validated regression model
kfoldfunCross-validate function for regression

Copy Semantics

Value. To learn how value classes affect copy operations, see Copying Objects.

Examples

collapse all

Load the sample data. Create a variable X containing the Horsepower and Weight data.

load carsmall
X = [Horsepower Weight];

Construct a regression tree using the sample data.

cvtree = fitrtree(X,MPG,'crossval','on');

Evaluate the cross-validation error of the carsmall data using Horsepower and Weight as predictor variables for mileage (MPG).

L = kfoldLoss(cvtree)
L = 25.5338

Extended Capabilities