To determine a good lasso-penalty strength for a linear regression model that uses least squares, implement 5-fold cross-validation.
Simulate 10000 observations from this model
is a 10000-by-1000 sparse matrix with 10% nonzero standard normal elements.
e is random normal error with mean 0 and standard deviation 0.3.
Create a set of 15 logarithmically-spaced regularization strengths from through .
Cross-validate the models. To increase execution speed, transpose the predictor data and specify that the observations are in columns. Optimize the objective function using SpaRSA.
CVMdl
is a RegressionPartitionedLinear
model. Because fitrlinear
implements 5-fold cross-validation, CVMdl
contains 5 RegressionLinear
models that the software trains on each fold.
Display the first trained linear regression model.
Mdl1 =
RegressionLinear
ResponseName: 'Y'
ResponseTransform: 'none'
Beta: [1000x15 double]
Bias: [1x15 double]
Lambda: [1x15 double]
Learner: 'leastsquares'
Properties, Methods
Mdl1
is a RegressionLinear
model object. fitrlinear
constructed Mdl1
by training on the first four folds. Because Lambda
is a sequence of regularization strengths, you can think of Mdl1
as 15 models, one for each regularization strength in Lambda
.
Estimate the cross-validated MSE.
Higher values of Lambda
lead to predictor variable sparsity, which is a good quality of a regression model. For each regularization strength, train a linear regression model using the entire data set and the same options as when you cross-validated the models. Determine the number of nonzero coefficients per model.
In the same figure, plot the cross-validated MSE and frequency of nonzero coefficients for each regularization strength. Plot all variables on the log scale.
Choose the index of the regularization strength that balances predictor variable sparsity and low MSE (for example, Lambda(10)
).
Extract the model with corresponding to the minimal MSE.
MdlFinal =
RegressionLinear
ResponseName: 'Y'
ResponseTransform: 'none'
Beta: [1000x1 double]
Bias: -0.0050
Lambda: 0.0037
Learner: 'leastsquares'
Properties, Methods
EstCoeff = 2×1
1.0051
1.9965
MdlFinal
is a RegressionLinear
model with one regularization strength. The nonzero coefficients EstCoeff
are close to the coefficients that simulated the data.