This is machine translation

Translated by Microsoft
Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

screenpredictors

Screen credit scorecard predictors for predictive value

Syntax

metric_table = screenpredictors(data)
metric_table = screenpredictors(___,Name,Value)

Description

example

metric_table = screenpredictors(data) returns the output variable, metric_table, a MATLAB® table containing the calculated values for several measures of predictive power for each predictor variable in the data. Use the screenpredictors function as a preprocessing step in the Credit Scorecard Modeling Workflow (Financial Toolbox) to reduce the number of predictor variables before you create the credit scorecard using the creditscorecard function from Financial Toolbox™.

example

metric_table = screenpredictors(___,Name,Value) specifies options using one or more name-value pair arguments in addition to the input arguments in the previous syntax.

Examples

collapse all

Reduce the number of predictor variables by screening predictors before you create a credit scorecard.

Use the CreditCardData.mat file to load the data (using a dataset from Refaat 2011).

load CreditCardData

Define 'IDVar' and 'ResponseVar'.

idvar = 'CustID';
responsevar = 'status';

Use screenpredictors to calculate the predictor screening metrics. The function returns a table containing the metrics values. Each table row corresponds to a predictor from the input table data.

metric_table = screenpredictors(data,'IDVar', idvar,'ResponseVar', responsevar)
metric_table=9×7 table
                   InfoValue    AccuracyRatio     AUROC     Entropy     Gini      Chi2PValue    PercentMissing
                   _________    _____________    _______    _______    _______    __________    ______________

    CustAge          0.18863       0.17095       0.58547    0.88729    0.42626    0.00074524          0       
    TmWBank          0.15719       0.13612       0.56806    0.89167    0.42864     0.0054591          0       
    CustIncome       0.15572       0.17758       0.58879      0.891    0.42731     0.0018428          0       
    TmAtAddress     0.094574      0.010421       0.50521    0.90089    0.43377         0.182          0       
    UtilRate        0.075086      0.035914       0.51796    0.90405    0.43575       0.45546          0       
    AMBalance        0.07159      0.087142       0.54357    0.90446    0.43592       0.48528          0       
    EmpStatus       0.048038       0.10886       0.55443    0.90814     0.4381    0.00037823          0       
    OtherCC         0.014301      0.044459       0.52223    0.91347    0.44132      0.047616          0       
    ResStatus      0.0097738       0.05039        0.5252    0.91422    0.44182       0.27875          0       

metric_table = sortrows(metric_table,'AccuracyRatio','descend')
metric_table=9×7 table
                   InfoValue    AccuracyRatio     AUROC     Entropy     Gini      Chi2PValue    PercentMissing
                   _________    _____________    _______    _______    _______    __________    ______________

    CustIncome       0.15572       0.17758       0.58879      0.891    0.42731     0.0018428          0       
    CustAge          0.18863       0.17095       0.58547    0.88729    0.42626    0.00074524          0       
    TmWBank          0.15719       0.13612       0.56806    0.89167    0.42864     0.0054591          0       
    EmpStatus       0.048038       0.10886       0.55443    0.90814     0.4381    0.00037823          0       
    AMBalance        0.07159      0.087142       0.54357    0.90446    0.43592       0.48528          0       
    ResStatus      0.0097738       0.05039        0.5252    0.91422    0.44182       0.27875          0       
    OtherCC         0.014301      0.044459       0.52223    0.91347    0.44132      0.047616          0       
    UtilRate        0.075086      0.035914       0.51796    0.90405    0.43575       0.45546          0       
    TmAtAddress     0.094574      0.010421       0.50521    0.90089    0.43377         0.182          0       

Based on the AccuracyRatio metric, select the top predictors to use when you create the creditscorecard object.

varlist = metric_table.Row(metric_table.AccuracyRatio > 0.09)
varlist = 4x1 cell array
    {'CustIncome'}
    {'CustAge'   }
    {'TmWBank'   }
    {'EmpStatus' }

Use creditscorecard to create a createscorecard object based on only the "screened" predictors.

sc = creditscorecard(data,'IDVar', idvar,'ResponseVar', responsevar, 'PredictorVars', varlist)
sc = 
  creditscorecard with properties:

                GoodLabel: 0
              ResponseVar: 'status'
               WeightsVar: ''
                 VarNames: {1x11 cell}
        NumericPredictors: {'CustAge'  'CustIncome'  'TmWBank'}
    CategoricalPredictors: {'EmpStatus'}
           BinMissingData: 0
                    IDVar: 'CustID'
            PredictorVars: {'CustAge'  'EmpStatus'  'CustIncome'  'TmWBank'}
                     Data: [1200x11 table]

Input Arguments

collapse all

Data for the creditscorecard object, specified as a MATLAB table, where each column of data can be any one of the following data types:

  • Numeric

  • Logical

  • Cell array of character vectors

  • Character array

  • Categorical

  • String

Data Types: table

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: metric_table = screenpredictors(data,'IDVar','CustAge','ResponseVar','status','PredictorVars',{'CustID','CustIncome'})

Name of identifier variable, specified as the comma-separated pair consisting of 'IDVar' and a case-sensitive character vector. The 'IDVar' data can be ordinal numbers or Social Security numbers. By specifying 'IDVar', you can omit the identifier variable from the predictor variables easily.

Data Types: char

Response variable name for the “Good” or “Bad” indicator, specified as the comma-separated pair consisting of 'ResponseVar' and a case-sensitive character vector. The response variable data must be binary.

If not specified, 'ResponseVar' is set to the last column of the input data by default.

Data Types: char

Names of predictor variables, specified as the comma-separated pair consisting of 'PredictorVars' and a case-sensitive cell array of character vectors or string array. By default, when you create a creditscorecard object, all variables are predictors except for IDVar and ResponseVar. Any name you specify using 'PredictorVars' must differ from the IDVar and ResponseVar names.

Data Types: cell | string

Name of weights variable, specified as the comma-separated pair consisting of 'WeightsVar' and a case-sensitive character vector to indicate which column name in the data table contains the row weights.

If you do not specify 'WeightsVar' when you create a creditscorecard object, then the function uses the unit weights as the observation weights.

Data Types: char

Number of (equal frequency) bins for numeric predictors, specified as the comma-separated pair consisting of 'NumBins' and a scalar numeric.

Data Types: double

Small shift in frequency tables that contain zero entries, specified as the comma-separated pair consisting of 'FrequencyShift' and a scalar numeric with a value between 0 and 1.

If the frequency table of a predictor contains any "pure" bins (containing all goods or all bads) after you bin the data using autobinning, then the function adds the 'FrequencyShift' value to all bins in the table. To avoid any perturbation, set 'FrequencyShift' to 0.

Data Types: double

Output Arguments

collapse all

Calculated values for the predictor screening metrics, returned as table. Each table row corresponds to a predictor from the input table data. The table columns contain calculated values for the following metrics:

  • 'InfoValue' — Information value. This metric measures the strength of a predictor in the fitting model by determining the deviation between the distributions of "Goods" and "Bads".

  • 'AccuracyRatio' — Accuracy ratio.

  • 'AUROC' — Area under the ROC curve.

  • 'Entropy' — Entropy. This metric measures the level of unpredictability in the bins. You can use the entropy metric to validate a risk model.

  • 'Gini' — Gini. This metric measures the statistical dispersion or inequality within a sample of data.

  • 'Chi2PValue' — Chi squared p-value. This metric is computed from the chi-squared metric and is a measure of the statistical difference and independence between groups.

  • 'PercentMissing' — Percentage of missing values in the predictor. This metric is expressed in decimal form.

Introduced in R2019a