sMC, Variable Importance Selection in PLS - Partial Least Squares

Significance Multivariate Correlation (sMC) for variable selection vs. VIP and SR
1.1K Downloads
Updated 1 Aug 2017

View License

----
Notes: sMC-PLS method is sensitive to the quality of PLS model. A poor quality PLS model may contain significant amount of irrelevant data variation which impacts the variable selection by sMC. Hence, it's strongly recommended that sMC is used in couple with WRT-PLS for quality assessment of the model for all components. Matlab code for WRT-PLS is available as below.
https://nl.mathworks.com/matlabcentral/fileexchange/63441-wrtpls--selection-of-the-number-of-components-in-pls-partial-least-squares
------
[1] DOI: http://dx.doi.org/10.1016/j.chemolab.2014.08.005
T.N. Tran, N.L. Afanador, L.M.C. Buydens, L. Blanchet, Interpretation of variable importance in Partial Least Squares with Significance Multivariate Correlation (sMC), Chemom. Intell. Lab. Syst. 138 (2014) 153–160.
[2] DOI: http://dx.doi.org/10.1016/j.chemolab.2014.09.008
N.L. Afanador, T.N. Tran, L. Blanchet, L.M.C. Buydens, Variable importance in PLS in the presence of autocorrelated data — Case studies in manufacturing processes Chemometrics and Intelligent Laboratory Systems, Volume 139, 2014, Pages 139–145
Highlights:
• Basic sequence theory is presented as a special case of the famous Krylov sequence.
• Variable importance in PLS via SR and VIP can be affected by basic sequence rotation.
• sMC is developed using the theoretical background of basic sequence.
• We introduce the application of an autocorrelation correction factor formulation in PLS regression.
• We compare the performance of the correction factor via various important variable selection methods.
• We introduce the application of an autocorrelation correction factor formulation in PLS regression.
• We compare the performance of the correction factor via various important variable selection methods.
Despite gaining popularity and success in many modeling applications, Partial Least Squares (PLS) regression continues to provide challenges in the evaluation of important variables. This article describes the relationship between the regression coefficients and orthogonally decomposed variances in PLS. The relation between prediction, model interpretation, and important variable determination is described using the theory of the basic sequence presented here as a special case of the famous Krylov sequence (or the power method).
Variable selection methods e.g. Selectivity Ratio (SR) and Variable Importance in the Projection (VIP) are also described in this framework. We show that the interpretation can be affected by unnecessary rotation toward the main source of variance in the X-block. Significance Multivariate Correlation (sMC) is developed using the knowledge obtained from the basic sequence to minimize the effect of irrelevant X-structures. Simultaneously sMC highlights the variables most correlated to the response. The performance of sMC is demonstrated, using simulated and real datasets, against commonly used variable selection methods, such as the Variable Importance in the Projection and Selectivity Ratio.
An integral part of interpreting atypical process performance in manufacturing processes is a multivariate understanding of process parameters and their relationship to a product's critical quality attributes. In this endeavor, Partial Least Squares (PLS) has greatly advanced the analysis of data that exhibits a high level of multicollinearity, but has not fully explored the impact to important variable selection in the presence of autocorrelation, particularly in the residuals, wherein a current observation is correlated to some degree with the previous observation(s). This autocorrelation provides an additional challenge to understand model performance and important variable selection. This paper introduces an autocorrelation correction factor formulation to PLS in an attempt to address this concern and illustrates its application to the recently proposed Significant Multivariate Correlation (SMC) variable selection method. Our results demonstrate that the correction factor formulation presented in this paper has the desired effect of driving down the false positive rate when applied to the SMC.
Keywords: PLS, Regression, Model Interpretation, Variable Importance, Variable Selection

EXAMPLE: on Octane NIR data, Kalivas, J. H., Two data sets of near infrared spectra. Chemom. Intell. Lab. Syst. 1997, 37 (2), 255-259.
load spectra
X = NIR;
y = octane;
Xm = bsxfun(@minus,X,mean(X)); % Mean centered X
ym = bsxfun(@minus,y,mean(y)); % Mean centered y
[b] = nipals_pls1(Xm,ym,6); % See NIPALS PLS code attached

[smcF smcFcrit] = smc(b, Xm); % significance level 0.01 and autocorrelation corrected sMC (recommendation)
figure;plot(smcF); hold on; plot([1 length(smcF)],[smcFcrit smcFcrit],'--r')

[smcF smcFcrit] = smc(b, Xm,0.01,false); % significance level 0.01 and without autocorrelation correction (faster but assume independent sampling)
figure;plot(smcF); hold on; plot([1 length(smcF)],[smcFcrit smcFcrit],'--r')

Cite As

Thanh Tran (2024). sMC, Variable Importance Selection in PLS - Partial Least Squares (https://www.mathworks.com/matlabcentral/fileexchange/48024-smc-variable-importance-selection-in-pls-partial-least-squares), MATLAB Central File Exchange. Retrieved .

MATLAB Release Compatibility
Created with R2013b
Compatible with any release
Platform Compatibility
Windows macOS Linux
Categories
Find more on Fourier Analysis and Filtering in Help Center and MATLAB Answers

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!
Version Published Release Notes
2.1.0.0

Added notes and a link to WRT-PLS for quality assessment.
Matlab code for WRT-PLS is available as below.
https://nl.mathworks.com/matlabcentral/fileexchange/63441-wrtpls--selection-of-the-number-of-components-in-pls-partial-least-squares

2.0.0.0

Update example sMC on Octane NIR data (See description)
...
-
-
Corrected sMC with autocorrelation information (recommendation) (See [2])
-
-
-

1.9.0.0

-

1.8.0.0

-

1.7.0.0

[Updated 06Jan2015] Description

1.6.0.0

-

1.5.0.0

-

1.4.0.0

-

1.3.0.0

[06Oct] Update title and cite reference DOI: http://dx.doi.org/10.1016/j.chemolab.2014.08.005

1.2.0.0

[06Oct] update references

1.1.0.0

[05Oct] update descriptions

1.0.0.0