Interpretability and Explainability for Credit Scoring

This example shows different techniques for interpreting and explaining the logic behind credit scoring predictions.

While credit scorecard models, in general, are straightforward to interpret, this example uses a black-box model, without revealing the logic, to show the workflow for explaining predictions. In this example, you work with the creditscorecard object from Financial Toolbox™ and pass the scoring function to interpretability tools in Statistics and Machine Learning Toolbox™. These tools include:

These tools support regression and classification modeling, which make interpretation more efficient. For more information on these techniques, see Interpret Machine Learning Models. In this example, the score model of the creditscorecard object is used as the black-box model. For an example of this workflow, see Interpret and Stress-Test Deep Learning Networks for Probability of Default.

Background

Credit scoring is the process by which lenders assign scores to borrowers and use those scores to decide whether or not to accept a loan application. Lenders use credit scoring models to come up with these scores. Traditionally, simple, interpretable models such as credit scorecards and logistic regression have been widely used in this area. Over time, Machine Learning (ML) and Artificial Intelligence (AI) techniques were introduced to implement credit scoring models. Such techniques, while improving predictive power, also are more black-box and there is little or no explanation behind the decisions. Consequently, the credit scoring predictions of ML and AI techniques are difficult for humans to interpret. As a result, lenders are implementing different interpretability and explainability methods to get a better understanding of the logic behind the credit scoring predictions. In addition, regulators are also requiring that practitioners use more interpretability and also fairness methods to ensure that no equal opportunity laws are broken while making credit decisions. For more information on using fairness metrics, see Explore Fairness Metrics for Credit Scoring Model.

Create Credit Scorecard Model

Load credit card data and create a credit scorecard model using the creditscorecard object.

load CreditCardData
sc = creditscorecard(data,IDVar="CustID");

Apply automatic binning. This example uses a split algorithm, with a maximum of 5 bins per predictor and with the constraint that each bin has at least 50 observations. For more information, see autobinning and Bin Data to Create Credit Scorecards Using Binning Explorer.

sc = autobinning(sc,Algorithm="Split",AlgorithmOptions={"MaxNumBins",5,"MinCount",50});

Verify that the binning for the numeric variables has five bins or levels. For example, here is the bin information for the customer age predictor.

bi = bininfo(sc,"CustAge");
disp(bi)

         Bin         Good    Bad     Odds        WOE       InfoValue 
    _____________    ____    ___    ______    _________    __________

    {'[-Inf,35)'}     93      76    1.2237     -0.50255      0.038003
    {'[35,47)'  }    321     184    1.7446     -0.14791     0.0094258
    {'[47,53)'  }    194      64    3.0312      0.40456       0.03252
    {'[53,61)'  }    128      64         2    -0.011271    2.0365e-05
    {'[61,Inf]' }     67       9    7.4444        1.303      0.079183
    {'Totals'   }    803     397    2.0227          NaN       0.15915

For categorical variables, there may be fewer than five bins in total, since the split algorithm in autobinning can merge categories into a single group. For example, the residential status predictor initially has three levels: Tenant, Home Owner, and Other. The split algorithm returns only two groups.

[bi,cg] = bininfo(sc,"ResStatus");
disp(bi)

       Bin        Good    Bad     Odds        WOE       InfoValue
    __________    ____    ___    ______    _________    _________

    {'Group1'}    672     344    1.9535    -0.034802    0.0010314
    {'Group2'}    131      53    2.4717      0.20049    0.0059418
    {'Totals'}    803     397    2.0227          NaN    0.0069732

The category grouping information shows that Tenant and Home Owner are merged into Group1. This grouping means Tenant and Home Owner will get the same number of points in the final scorecard.

disp(cg)

       Category       BinNumber
    ______________    _________

    {'Tenant'    }        1    
    {'Home Owner'}        1    
    {'Other'     }        2

Fit the model coefficients using fitmodel. For illustration purposes, keep only five model predictors, including some categorical ones.

PredictorsInModel = ["CustAge" "CustIncome" "EmpStatus" "ResStatus" "UtilRate"];
sc = fitmodel(sc,PredictorVars=PredictorsInModel,VariableSelection="fullmodel",Display="off");

Scale the points so that 500 points correspond to odds of 2, and the odds double every 50 points.

sc = formatpoints(sc,PointsOddsAndPDO=[500 2 50]);

Display the scorecard using displaypoints. The credit scorecard model is a lookup table. For example, for customer age there are five bins or levels with different points for each level. A visualization of the score, as a function of age, has a piecewise constant pattern with five levels, as shown in Partial Dependence Plot. For residential status, Tenant and Home Owner are in Group1, and they get the same number of points.

[ScorecardPointsTable,MinPts,MaxPts] = displaypoints(sc);
disp(ScorecardPointsTable)

      Predictors             Bin           Points
    ______________    _________________    ______

    {'CustAge'   }    {'[-Inf,35)'    }     71.84
    {'CustAge'   }    {'[35,47)'      }    91.814
    {'CustAge'   }    {'[47,53)'      }    122.93
    {'CustAge'   }    {'[53,61)'      }    99.511
    {'CustAge'   }    {'[61,Inf]'     }    173.54
    {'CustAge'   }    {'<missing>'    }       NaN
    {'ResStatus' }    {'Group1'       }    97.318
    {'ResStatus' }    {'Group2'       }    116.43
    {'ResStatus' }    {'<missing>'    }       NaN
    {'EmpStatus' }    {'Unknown'      }    85.326
    {'EmpStatus' }    {'Employed'     }    118.11
    {'EmpStatus' }    {'<missing>'    }       NaN
    {'CustIncome'}    {'[-Inf,31000)' }    68.158
    {'CustIncome'}    {'[31000,38000)'}    102.11
    {'CustIncome'}    {'[38000,42000)'}    93.302
    {'CustIncome'}    {'[42000,47000)'}    109.18
    {'CustIncome'}    {'[47000,Inf]'  }    121.21
    {'CustIncome'}    {'<missing>'    }       NaN
    {'UtilRate'  }    {'[-Inf,0.12)'  }    106.84
    {'UtilRate'  }    {'[0.12,0.3)'   }    94.647
    {'UtilRate'  }    {'[0.3,0.39)'   }    140.95
    {'UtilRate'  }    {'[0.39,0.68)'  }    69.635
    {'UtilRate'  }    {'[0.68,Inf]'   }    94.634
    {'UtilRate'  }    {'<missing>'    }       NaN

One "traditional" approach to measure the importance of each predictor in the credit scorecard model is to compute the percent of the total score range that comes from each predictor.

PtsRange = MaxPts - MinPts;
NumPred = length(PredictorsInModel);
PercentWeight  = zeros(NumPred,1);

for ii = 1 : NumPred
   Ind = strcmpi(PredictorsInModel{ii},ScorecardPointsTable.Predictors);
   MaxPtsPred = max(ScorecardPointsTable.Points(Ind));
   MinPtsPred = min(ScorecardPointsTable.Points(Ind));
   PercentWeight(ii) = 100*(MaxPtsPred-MinPtsPred)/PtsRange;
end

PredictorWeights = table(PredictorsInModel',PercentWeight,VariableNames=["Predictor" "Weight"]);
disp(PredictorWeights)

     Predictor      Weight
    ____________    ______

    "CustAge"       36.587
    "CustIncome"    19.085
    "EmpStatus"     11.795
    "ResStatus"     6.8768
    "UtilRate"      25.656

Customer age is the main variable in the model, since it corresponds to 36% of the total score range. A customer can get anywhere from 71.8 to 173.5 points, based on their age. This range has a difference of over 100 points between the minimum and maximum values. On the other end, residential status plays a minor role in the score, with points ranging from 97.3 to 116.4 only, a difference of less than 20 points.

An alternative to this "traditional" approach is to use the following explainability techniques from Statistics and Machine Learning Toolbox: Partial Dependence Plot, Individual Conditional Expectation Plot, Local Interpretable Model-Agnostic Explanation Plot, and Shapley Values.

Partial Dependence Plot

The partial dependence plot (PDP) shows the effect of one or two variables on the predicted score.

Use the plotPartialDependence function to pass the score method of the creditscorecard object as a black-box model.

One Predictor

Select a predictor using the dropdown option.

As an example, if customer age is selected, note the piecewise constant shape of the plot, with jumps occurring at the bin edges, and with five levels in total. This is consistent with the five bins for customer age in the credit scorecard model.

predictor = PredictorsInModel(1);

plotPartialDependence(@(tbl)score(sc,tbl),predictor,data)

Figure contains an axes object. The axes object with title Partial Dependence Plot, xlabel CustAge, ylabel Y contains an object of type line.

Two Predictors

Generating a partial dependence plot with two predictors can take significantly longer than the one-predictor case. Typically, the more unique values a predictor has in the data set, the longer it takes to plot the partial dependence. Here's a report of the number of unique values in the data.

NumUniqueValuesTable = varfun(@(x)length(unique(x)),data(:,PredictorsInModel));
NumUniqueValuesTable.Properties.VariableNames = erase(NumUniqueValuesTable.Properties.VariableNames,'Fun_');
disp(NumUniqueValuesTable)

    CustAge    CustIncome    EmpStatus    ResStatus    UtilRate
    _______    __________    _________    _________    ________

      54           45            2            3          110

The categorical predictors have fewer unique levels, so these plots for categorical predictors run faster. Numeric variables like customer age are relatively discrete and so is utilization rate because this rate's values are rounded to two decimals. However, a continuous predictor (for example, the average monthly balance (AMBalance) in the data table) can have many unique values.

Select a predictor and additional predictor to then use plotPartialDependence to generate the PDP plot.

predictor = PredictorsInModel(1);
additionalPredictor = PredictorsInModel(4);

plotPartialDependence(@(tbl)score(sc,tbl),[predictor,additionalPredictor],data)

Figure contains an axes object. The axes object with title Partial Dependence Plot, xlabel CustAge, ylabel ResStatus contains an object of type surface.

Individual Conditional Expectation Plot

Similar to the partial dependence plot, the individual conditional expectation plot (ICE) shows the effect of one of the variables on the predicted score. The red line in the ICE plot matches the partial dependence plot. While the partial dependence plot shows the average score as a function of the selected predictor, the ICE plot disaggregates and shows the score for each observation (each gray line) as a function of the selected predictor. For more information, see the More About section on the plotPartialDependence reference page.

predictor = PredictorsInModel(1);

plotPartialDependence(@(tbl)score(sc,tbl),predictor, ...
    data,Conditional="absolute")

Figure contains an axes object. The axes object with title Individual Conditional Expectation Plot, xlabel CustAge, ylabel Y contains 1202 objects of type line, scatter.

Select a Query Point

The PDP and ICE plots provide a global view of the credit scorecard scores, where the score is visualized for all values of the selected predictor. In contrast, LIME and Shapley are local explainability techniques that explain the behavior of the model in a neighborhood of a query point of choice. For more information, see Interpret Machine Learning Models.

To see how a query point helps to explain credit scores, use index 92 in the training data as your query point. You can select other query points by typing an index value into the text box.

QueryPointIndex = 92; % ID number of the observation to explain

Use score to display the query point score and the points, by predictor, for this query point.

[ScoresTraining,PointsTraining] = score(sc,data);
fprintf("Selected index %d, with score %g\n",QueryPointIndex,ScoresTraining(QueryPointIndex))

Selected index 92, with score 417.289

disp(PointsTraining(QueryPointIndex,:))

    CustAge    ResStatus    EmpStatus    CustIncome    UtilRate
    _______    _________    _________    __________    ________

     71.84      97.318       85.326        68.158       94.647

The plots that follow show the location of the query point (dotted vertical line) relative to the distribution of values for the scores and for each predictor. For example, for index 92, the score is low relative to the distribution. For the customer age predictor, the query point is on the bottom group. This result is similar for the customer income, employment status, and residential status predictors. The points for the utilization rate predictor are closer to the middle of the distribution, but still below average.

figure
t = tiledlayout(3,2);
nexttile
plotQueryInHistogram("Score",QueryPointIndex,ScoresTraining,PointsTraining)
nexttile
plotQueryInHistogram("CustAge",QueryPointIndex,ScoresTraining,PointsTraining)
nexttile
plotQueryInHistogram("CustIncome",QueryPointIndex,ScoresTraining,PointsTraining)
nexttile
plotQueryInHistogram("EmpStatus",QueryPointIndex,ScoresTraining,PointsTraining)
nexttile
plotQueryInHistogram("ResStatus",QueryPointIndex,ScoresTraining,PointsTraining)
nexttile
plotQueryInHistogram("UtilRate",QueryPointIndex,ScoresTraining,PointsTraining)
title(t,"Query Point Relative to Distribution")

Local Interpretable Model-Agnostic Explanation Plot

The local interpretable model-agnostic explanation (LIME) plot shows the coefficients of a local linear model near the instance of a score that you want to explain. LIME explains the scores around a particular observation, or query point, with a simple local model, such as a linear regression model or a decision tree.

Use lime to create a lime object specifying the data set of interest (the training data set), the model "type" (use "regression" to indicate a numeric prediction), and which variables are categorical. When you create a lime object, the toolbox generates a random synthetic data set. Use the synthetic data to fit simple local models to explain the local behavior.

rng('default'); % for reproducibility
limeExplainer = lime(@(tbl)score(sc,tbl),data(:,PredictorsInModel),Type="regression", ...
    CategoricalPredictors=["ResStatus" "EmpStatus"]);

Select a maximum number of predictors (NumPredToExplain) to explain and use a SimpleModelType of a "tree" to explain the local behavior of the score. The results are sensitive to the kernel width parameter (KernelWidthChoice) that controls how much neighbor points are weighted while fitting the linear simple model.

NumPredToExplain = 5; % number of variables/predictors to explain
KernelWidthChoice = 0.5;
limeExplainer = fit(limeExplainer,data(QueryPointIndex,PredictorsInModel), ...
   NumPredToExplain,SimpleModelType="tree",KernelWidth=KernelWidthChoice);

figure
f = plot(limeExplainer);

Figure contains an axes object. The axes object with title LIME with Decision Tree Model, xlabel Predictor Importance, ylabel Predictor contains an object of type bar.

When the simple model is a tree, based on the reported predictor importance, customer age is the main predictor, followed by employment status and customer income.

Shapley Values

Shapley values explain the deviation of the predicted score from the average predicted score. The sum of the Shapley values for all predictors corresponds to the total deviation of the score for the query point from the average score.

The Shapley values are estimated based on a simulation. For larger data sets, this simulation is time consuming. For illustration purposes, this example uses only 500 rows of the training data with the shapley constructor.

rng('default'); % for reproducibility

shapleyExplainer = shapley(@(tbl)score(sc,tbl),data(500,PredictorsInModel), ...
    QueryPoint=data(QueryPointIndex,PredictorsInModel),CategoricalPredictors=["EmpStatus" "ResStatus"]);

figure
plot(shapleyExplainer)

Figure contains an axes object. The axes object with title Shapley Explanation, xlabel Shapley Value, ylabel Predictor contains an object of type bar.

For the query point with index 92, the predicted score is 417, whereas the average score for the training data set passed to shapley function is 516. You expect the Shapley values to be negative, or at least have important negative components that explain why the predicted score is below average. In contrast, for scores above average, the Shapley values add up to a positive amount. In this example, the estimated Shapley values show that the main deviation from the average is explained by the customer income and employment status predictors, followed by the customer age and utilization rate predictors. The residential status predictor is not important. This result might be a combination of the simulation itself with the fact that residential status has a smaller impact on scores for this model.

Final Remarks

Explainability techniques are widely used to understand the behavior of predictive models. In this example, a creditscorecard model shows how explainability techniques, such as PDP, ICE, LIME, and Shapley are applied to explain a black-box model. Although credit scorecard models are simple and interpretable, you can apply the explainability tools in this example to other scoring models that are treated as black-box models or to supported models in Statistics and Machine Learning Toolbox. Alternatively, instead of explaining the scores, you can pass the probdefault function as the black-box model to explain the probability of default predictions.

Local Functions

function plotQueryInHistogram(VariableChoice,QueryPointIndex,Scores,PointsTable)

if VariableChoice=="Score"
   HistData = Scores;
else
   HistData = PointsTable.(VariableChoice);
end

histogram(HistData)
hold on
xline(HistData(QueryPointIndex),':','LineWidth',2.5)
hold off
xlabel(VariableChoice)
ylabel('Frequency')

end