# screenpredictors

Screen credit scorecard predictors for predictive value

## Syntax

``````metric_table = screenpredictors(data)``````
``````metric_table = screenpredictors(___,Name,Value)``````

## Description

``````metric_table = screenpredictors(data)``` returns the output variable, `metric_table`, a MATLAB® table containing the calculated values for several measures of predictive power for each predictor variable in the `data`. Use the `screenpredictors` function as a preprocessing step in the Credit Scorecard Modeling Workflow to reduce the number of predictor variables before you create the credit scorecard using the `creditscorecard` function from Financial Toolbox™.```

``````metric_table = screenpredictors(___,Name,Value)``` specifies options using one or more name-value pair arguments in addition to the input arguments in the previous syntax. ```

## Examples

Reduce the number of predictor variables by screening predictors before you create a credit scorecard.

Use the `CreditCardData.mat` file to load the data (using a dataset from Refaat 2011).

`load CreditCardData`

Define `'IDVar'` and `'ResponseVar'`.

```idvar = 'CustID'; responsevar = 'status';```

Use `screenpredictors` to calculate the predictor screening metrics. The function returns a table containing the metrics values. Each table row corresponds to a predictor from the input table data.

`metric_table = screenpredictors(data,'IDVar', idvar,'ResponseVar', responsevar)`
```metric_table=9×7 table InfoValue AccuracyRatio AUROC Entropy Gini Chi2PValue PercentMissing _________ _____________ _______ _______ _______ __________ ______________ CustAge 0.18863 0.17095 0.58547 0.88729 0.42626 0.00074524 0 TmWBank 0.15719 0.13612 0.56806 0.89167 0.42864 0.0054591 0 CustIncome 0.15572 0.17758 0.58879 0.891 0.42731 0.0018428 0 TmAtAddress 0.094574 0.010421 0.50521 0.90089 0.43377 0.182 0 UtilRate 0.075086 0.035914 0.51796 0.90405 0.43575 0.45546 0 AMBalance 0.07159 0.087142 0.54357 0.90446 0.43592 0.48528 0 EmpStatus 0.048038 0.10886 0.55443 0.90814 0.4381 0.00037823 0 OtherCC 0.014301 0.044459 0.52223 0.91347 0.44132 0.047616 0 ResStatus 0.0097738 0.05039 0.5252 0.91422 0.44182 0.27875 0 ```
`metric_table = sortrows(metric_table,'AccuracyRatio','descend')`
```metric_table=9×7 table InfoValue AccuracyRatio AUROC Entropy Gini Chi2PValue PercentMissing _________ _____________ _______ _______ _______ __________ ______________ CustIncome 0.15572 0.17758 0.58879 0.891 0.42731 0.0018428 0 CustAge 0.18863 0.17095 0.58547 0.88729 0.42626 0.00074524 0 TmWBank 0.15719 0.13612 0.56806 0.89167 0.42864 0.0054591 0 EmpStatus 0.048038 0.10886 0.55443 0.90814 0.4381 0.00037823 0 AMBalance 0.07159 0.087142 0.54357 0.90446 0.43592 0.48528 0 ResStatus 0.0097738 0.05039 0.5252 0.91422 0.44182 0.27875 0 OtherCC 0.014301 0.044459 0.52223 0.91347 0.44132 0.047616 0 UtilRate 0.075086 0.035914 0.51796 0.90405 0.43575 0.45546 0 TmAtAddress 0.094574 0.010421 0.50521 0.90089 0.43377 0.182 0 ```

Based on the `AccuracyRatio` metric, select the top predictors to use when you create the `creditscorecard` object.

`varlist = metric_table.Row(metric_table.AccuracyRatio > 0.09)`
```varlist = 4x1 cell {'CustIncome'} {'CustAge' } {'TmWBank' } {'EmpStatus' } ```

Use `creditscorecard` to create a `createscorecard` object based on only the "screened" predictors.

`sc = creditscorecard(data,'IDVar', idvar,'ResponseVar', responsevar, 'PredictorVars', varlist)`
```sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {'CustAge' 'CustIncome' 'TmWBank'} CategoricalPredictors: {'EmpStatus'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {'CustAge' 'EmpStatus' 'CustIncome' 'TmWBank'} Data: [1200x11 table] ```

## Input Arguments

Data for the `creditscorecard` object, specified as a MATLAB table, tall table, or tall timetable, where each column of data can be any one of the following data types:

• Numeric

• Logical

• Cell array of character vectors

• Character array

• Categorical

• String

Data Types: `table`

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: ```metric_table = screenpredictors(data,'IDVar','CustAge','ResponseVar','status','PredictorVars',{'CustID','CustIncome'})```

Name of identifier variable, specified as the comma-separated pair consisting of `'IDVar'` and a case-sensitive character vector. The `'IDVar'` data can be ordinal numbers or Social Security numbers. By specifying `'IDVar'`, you can omit the identifier variable from the predictor variables easily.

Data Types: `char`

Response variable name for the “Good” or “Bad” indicator, specified as the comma-separated pair consisting of `'ResponseVar'` and a case-sensitive character vector. The response variable data must be binary.

If not specified, `'ResponseVar'` is set to the last column of the input `data` by default.

Data Types: `char`

Names of predictor variables, specified as the comma-separated pair consisting of `'PredictorVars'` and a case-sensitive cell array of character vectors or string array. By default, when you create a `creditscorecard` object, all variables are predictors except for `IDVar` and `ResponseVar`. Any name you specify using `'PredictorVars'` must differ from the `IDVar` and `ResponseVar` names.

Data Types: `cell` | `string`

Name of weights variable, specified as the comma-separated pair consisting of `'WeightsVar'` and a case-sensitive character vector to indicate which column name in the `data` table contains the row weights.

If you do not specify `'WeightsVar'` when you create a `creditscorecard` object, then the function uses the unit weights as the observation weights.

Data Types: `char`

Number of (equal frequency) bins for numeric predictors, specified as the comma-separated pair consisting of `'NumBins'` and a scalar numeric.

Data Types: `double`

Small shift in frequency tables that contain zero entries, specified as the comma-separated pair consisting of `'FrequencyShift'` and a scalar numeric with a value between `0` and `1`.

If the frequency table of a predictor contains any "pure" bins (containing all goods or all bads) after you bin the data using `autobinning`, then the function adds the `'FrequencyShift'` value to all bins in the table. To avoid any perturbation, set `'FrequencyShift'` to `0`.

Data Types: `double`

## Output Arguments

collapse all

Calculated values for the predictor screening metrics, returned as table. Each table row corresponds to a predictor from the input table data. The table columns contain calculated values for the following metrics:

• `'InfoValue'` — Information value. This metric measures the strength of a predictor in the fitting model by determining the deviation between the distributions of `"Goods"` and `"Bads"`.

• `'AccuracyRatio'` — Accuracy ratio.

• `'AUROC'` — Area under the ROC curve.

• `'Entropy'` — Entropy. This metric measures the level of unpredictability in the bins. You can use the entropy metric to validate a risk model.

• `'Gini'` — Gini. This metric measures the statistical dispersion or inequality within a sample of data.

• `'Chi2PValue'` — Chi-square p-value. This metric is computed from the chi-square metric and is a measure of the statistical difference and independence between groups.

• `'PercentMissing'` — Percentage of missing values in the predictor. This metric is expressed in decimal form.