Explain Model Predictions for Regression Models Trained in Regression Learner App

Understanding how some machine learning models make predictions can be difficult. Interpretability tools help reveal how predictors contribute (or do not contribute) to predictions. You can also use these tools to validate whether a model uses the correct evidence for its predictions, and find model biases that are not immediately apparent.

Regression Learner provides functionality for two levels of model interpretation: local and global.

LevelObjectiveUse CaseApp Functionality
Local interpretationExplain a prediction for a single query point.
• Identify important predictors for an individual prediction.

• Examine a counterintuitive prediction.

Use LIME or Shapley values for a specified query point. See Explain Local Model Predictions Using LIME Values or Explain Local Model Predictions Using Shapley Values.
Global interpretationExplain how a trained model makes predictions for the entire data set.
• Demonstrate how a trained model works.

• Compare different models.

Use partial dependence plots for the predictors of interest. See Interpret Model Using Partial Dependence Plots.

Explain Local Model Predictions Using LIME Values

Use LIME (local interpretable model-agnostic explanations) to interpret a prediction for a query point by fitting a simple interpretable model for the query point. The simple model acts as an approximation for the trained model and explains model predictions around the query point. The simple model can be a linear model or a decision tree model. You can use the estimated coefficients of a linear model or the estimated predictor importance of a decision tree model to explain the contribution of individual predictors to the prediction for the query point.

After you train a model in Regression Learner, select the model in the Models pane. On the Explain tab, in the Local Explanations section, click LIME. The app opens a new tab. In the left plot or table, select a query point. In the right plot or table, the app displays the LIME values corresponding to the query point. The app uses the `lime` function to compute the LIME values. When computing LIME values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data).

Note

Regression Learner does not support LIME explanations for models trained after applying feature selection or principal component analysis (PCA).

Select Query Point

To select a query point, you can use various controls.

• To the right of the LIME plots, under Data, choose whether to select a query point from the Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing.

• Above the left plot, under Select Query Point, choose whether to select a query point from a plot (Plot) or a table (Table). If using a plot, click a point in the plot to designate the associated observation as the query point. If using a table, click a row in the table to select the associated observation as the query point.

Alternatively, select a query point using the index of the observation in the selected data set. To the right of the LIME plots, under Query Point, enter the observation index.

• To make selecting a query point from a plot easier, you can change the plot display by using the controls below the left plot. You can specify the plot type, select the x-axis and y-axis variables, and choose the values to display (such as true responses, predicted responses, and errors).

• After selecting a query point, you can expand the LIME Explanations display by hiding the Select Query Point display. To the right of the LIME plots, under Data, clear the Show query points check box.

Plot LIME Explanations

Given a query point, view its LIME values by using the LIME Explanations display. Choose whether to view the results using a bar graph (Plot) or a table (Table). The table includes the predictor values at the query point.

The meaning of the LIME values depends on the type of LIME model used. To the right of the LIME plots, in the Simple Model section under LIME Options, specify the type of simple model to use for approximating the behavior of the trained model.

• If you use a `Linear` simple model, the LIME values correspond to the coefficient values of the simple model. The bar graph shows the coefficients, sorted by their absolute values. For each categorical predictor, the software creates one less dummy variable than the number of categories, and the bar graph displays only the most important dummy variable. You can check the coefficients of the other dummy variables using the `SimpleModel` property of the exported results object. For more information, see Export LIME Results.

• If you use a `Tree` simple model, the LIME values correspond to the estimated predictor importance values of the simple model. The bar graph shows the predictor importance values, sorted by their absolute values. The bar graph shows LIME values only for the subset of predictors included in the simple model.

Below the display of the LIME explanations, the app shows the query point predictions for the trained model (for example, Model 1 prediction) and the simple model (for example, LIME model prediction). If the two predictions are not close, the simple model is not a good approximation of the trained model at the query point. You can change the simple model so that it better matches the trained model at the query point by adjusting LIME options.

To adjust LIME options, you can use various controls to the right of the LIME plots, under LIME Options.

Under Simple Model, you can set these options:

• Simple model — Specify the type of simple model to use for approximating the behavior of the trained model. Choose between a linear model, which uses `fitrlinear`, and a decision tree, which uses `fitrtree`. For more information, see `SimpleModelType`.

In Regression Learner, linear simple models use a `BetaTolerance` value of 0.00000001.

• Max num predictors — Specify the maximum number of predictors to use for training the simple model. For a linear simple model, this value indicates the maximum number of predictors to include in the model, not counting expanded categorical predictors. Unimportant predictors are not included in the simple model. For a tree simple model, this value indicates the maximum number of decision splits (or branch nodes) in the tree, which might cause the model to include fewer predictors than the specified maximum. For more information, see `numImportantPredictors`.

• Kernel width — Specify the width of the kernel function used to fit the simple model. Smaller kernel widths create LIME models that focus on data samples near the query point. For more information, see `KernelWidth`.

Under Synthetic Predictor Data, you can set these options:

• Num data samples — Specify the number of synthetic data samples to generate for training the simple model. For more information, see `NumSyntheticData`.

• Data locality — Specify the locality of the data to use for synthetic data generation. A `Global` locality uses all observations in the training set, and a `Local` locality uses the k-nearest neighbors of the query point. (Recall that the training set contains the data used to train the final model and includes all the observations that are not reserved for testing.) For more information, see `DataLocality`.

• Num neighbors — Specify the number of k-nearest neighbors for the query point. This option is valid only when the data locality is `Local`. For more information, see `NumNeighbors`.

For more information on the LIME algorithm and how synthetic data is used, see LIME.

Perform What-If Analysis

After computing the LIME results for a query point, you can perform what-if analysis and compare the LIME results for the original query point to the results for a custom query point. For example, you can see whether the important predictors change when the query point predictor values deviate slightly from their original values.

To the right of the LIME plots, under Query Point, select What-if analysis. The app creates a table that shows the predictor values for the original query point and a custom query point. Manually specify the predictor values of the custom query point by editing the Custom Value table entries. To better see the table entries, you can increase the width of the plot options panel by using the plus button + at the top of the panel.

After you specify a custom query point, the app updates the display of the LIME results.

• The query point plot shows the original query point as a black circle and the custom query point as a green square.

• The LIME explanations bar graph shows the LIME values for the original and custom query points, and differentiates the two sets of bars by using different colors and edge styles.

• The LIME explanations table includes the LIME and predictor values for both query points.

• Below the display of the LIME explanations, you can find the trained model and simple model predictions for both query points. Ensure that the two predictions for the custom query point are close. Otherwise, the simple model is not a good approximation of the trained model at the custom query point.

Export LIME Results

After computing LIME values, you can export your results by using any of the following options in the Export section on the Explain tab.

• To export the LIME explanations bar graph to a figure, click .

• To export the LIME explanations table to the workspace, click and select Export Results Table.

• To export the query point model explainer object to the workspace, click and select Export Results Object. If you specify a custom query point by using what-if analysis, the model explainer object corresponds to the custom query point. For more information on the explainer object, see `lime`.

Explain Local Model Predictions Using Shapley Values

Use the Shapley value of a predictor for a query point to explain the deviation of the query point prediction from the average prediction, due to the predictor. For regression models, predictions are response values. For a query point, the sum of the Shapley values for all predictors corresponds to the total deviation of the prediction from the average.

After you train a model in Regression Learner, select the model in the Models pane. On the Explain tab, in the Local Explanations section, click Local Shapley. The app opens a new tab. In the left plot or table, select a query point. In the right plot or table, the app displays the Shapley values corresponding to the query point. The app uses the `shapley` function to compute the Shapley values. When computing Shapley values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data).

Note

Regression Learner does not support Shapley explanations for models trained after applying feature selection or PCA.

Select Query Point

To select a query point, you can use various controls.

• To the right of the Shapley plots, under Data, choose whether to select a query point from the Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing.

• Above the left plot, under Select Query Point, choose whether to select a query point from a plot (Plot) or a table (Table). If using a plot, click a point in the plot to designate the associated observation as the query point. If using a table, click a row in the table to select the associated observation as the query point.

Alternatively, select a query point using the index of the observation in the selected data set. To the right of the Shapley plots, under Query Point, enter the observation index.

• To make selecting a query point from a plot easier, you can change the plot display by using the controls below the left plot. You can specify the plot type, select the x-axis and y-axis variables, and choose the values to display (such as true responses, predicted responses, and errors).

• After selecting a query point, you can expand the Shapley Explanations display by hiding the Select Query Point display. To the right of the Shapley plots, under Data, clear the Show query points check box.

Plot Shapley Explanations

Given a query point, view its Shapley values by using the Shapley Explanations display. Each Shapley value explains the deviation of the prediction for the query point from the average prediction, due to the corresponding predictor. Choose whether to view the results using a bar graph (Plot) or a table (Table). The horizontal bar graph shows the Shapley values for all predictors, sorted by their absolute values. The table includes the predictor values at the query point along with the Shapley values. Below the display of the Shapley explanations, the app shows the query point prediction and the average model prediction. The sum of the Shapley values equals the difference between the two predictions.

If the trained model includes many predictors, you can choose to display only the most important predictors in the bar graph. To the right of the Shapley plots, under Shapley Plot, specify the number of important predictors to show in the Shapley Explanations bar graph. The app displays the specified number of Shapley values with the largest absolute value.

To adjust Shapley options, you can use various controls to the right of the Shapley plots. Under Shapley Options, you can set these options:

• Num data samples — Specify the number of observations sampled from the training set to use for Shapley value computations. (Recall that the training set contains the data used to train the final model and includes all the observations that are not reserved for testing.) If the value equals the number of observations in the training set, the app uses every observation in the data set.

When the training set has over 1000 observations, the Shapley value computations can be slow. For faster computations, consider using a smaller number of data samples.

• Method — Specify the algorithm to use when computing Shapley values. The `Interventional` option computes Shapley values with an interventional value function. The app uses the Kernel SHAP, Linear SHAP, or Tree SHAP algorithm, depending on the trained model type and other specified options. The `Conditional` option uses the extension to the Kernel SHAP algorithm with a conditional value function. For more information, see `Method`.

• Max num subsets mode — Allow the app to choose the maximum number of predictor subsets automatically, or specify a value manually. You can check the number of predictor subsets used by querying the `NumSubsets` property of the exported results object. For more information, see Export Shapley Results.

• Manual max num subsets — When you set Max num subsets mode to `Manual`, specify the maximum number of predictor subsets to use for Shapley value computations. This option is valid only when the app uses the Kernel SHAP algorithm or the extension to the Kernel SHAP algorithm. For more information, see `MaxNumSubsets`.

For more information on the algorithms used to compute Shapley values, see Shapley Values for Machine Learning Model.

Perform What-If Analysis

After computing the Shapley results for a query point, you can perform what-if analysis and compare the Shapley results for the original query point to the results for a custom query point. For example, you can see whether the important predictors change when the query point predictor values deviate slightly from their original values.

To the right of the Shapley plots, under Query Point, select What-if analysis. The app creates a table that shows the predictor values for the original query point and a custom query point. Manually specify the predictor values of the custom query point by editing the Custom Value table entries. To better see the table entries, you can increase the width of the plot options panel by using the plus button + at the top of the panel.

After you specify a custom query point, the app updates the display of the Shapley results.

• The query point plot shows the original query point as a black circle and the custom query point as a green square.

• The Shapley explanations bar graph shows the Shapley values for the original and custom query points, and differentiates the two sets of bars by using different colors and edge styles.

• The Shapley explanations table includes the Shapley and predictor values for both query points.

• Below the display of the Shapley explanations, you can find the model predictions for both query points. For easy comparison, the app lists the average model prediction twice, once below each query point prediction.

Export Shapley Results

After computing Shapley values, you can export your results by using any of the following options in the Export section on the Explain tab.

• To export the Shapley explanations bar graph to a figure, click .

• To export the Shapley explanations table to the workspace, click and select Export Results Table.

• To export the query point model explainer object to the workspace, click and select Export Results Object. If you specify a custom query point by using what-if analysis, the model explainer object corresponds to the custom query point. For more information on the explainer object, see `shapley`.

Interpret Model Using Partial Dependence Plots

Partial dependence plots (PDPs) allow you to visualize the marginal effect of each predictor on the predicted response of a trained regression model. After you train a model in Regression Learner, you can view a partial dependence plot for the model. On the Explain tab, in the Global Explanations section, click . When computing partial dependence values, the app uses the final model, trained on the full data set (including training and validation data, but excluding test data).

To investigate your results, use the controls on the right.

• Under Data, choose whether to plot results using Training set data or Test set data. The training set refers to the data used to train the final model and includes all the observations that are not reserved for testing.

• Under Feature, choose the feature to plot using the X list. The plotted line corresponds to the average predicted response across the predictor values. The x-axis tick marks in the plot correspond to the unique predictor values in the selected data set.

If you use PCA to train a model, you can select principal components from the X list.

• Zoom in and out, or pan across the plot. To enable zooming or panning, place the mouse over the PDP and click the corresponding button on the toolbar that appears above the top right of the plot.

For an example, see Use Partial Dependence Plots to Interpret Regression Models Trained in Regression Learner App. For more information on partial dependence plots, see `plotPartialDependence`.

To export PDPs you create in the app to figures, see Export Plots in Regression Learner App.