Documentation

### This is machine translation

Mouseover text to see original. Click the button below to return to the English version of the page.

Note: This page has been translated by MathWorks. Click here to see
To view all translated materials including this page, select Country from the country navigator on the bottom of this page.

# fitmodel

Fit logistic regression model to Weight of Evidence (WOE) data

## Syntax

``sc = fitmodel(sc)``
``````[sc,mdl] = fitmodel(sc)``````
``````[sc,mdl] = fitmodel(___,Name,Value)``````

## Description

example

````sc = fitmodel(sc)` fits a logistic regression model to the Weight of Evidence (WOE) data and stores the model predictor names and corresponding coefficients in the `creditscorecard` object. `fitmodel` internally transforms all the predictor variables into WOE values, using the bins found with the automatic or manual binning process. The response variable is mapped so that "Good" is `1`, and "Bad" is `0`. This implies that higher (unscaled) scores correspond to better (less risky) individuals (smaller probability of default).Alternatively, you can use `setmodel` to provide names of the predictors that you want in the logistic regression model, along with their corresponding coefficients. ```

example

``````[sc,mdl] = fitmodel(sc)``` fits a logistic regression model to the Weight of Evidence (WOE) data and stores the model predictor names and corresponding coefficients in the `creditscorecard` object. `fitmodel` returns an updated `creditscorecard` object and a `GeneralizedLinearModel` object containing the fitted model.`fitmodel` internally transforms all the predictor variables into WOE values, using the bins found with the automatic or manual binning process. The response variable is mapped so that "Good" is `1`, and "Bad" is `0`. This implies that higher (unscaled) scores correspond to better (less risky) individuals (smaller probability of default).Alternatively, you can use `setmodel` to provide names of the predictors that you want in the logistic regression model, along with their corresponding coefficients.```

example

``````[sc,mdl] = fitmodel(___,Name,Value)``` fits a logistic regression model to the Weight of Evidence (WOE) data using optional name-value pair arguments and stores the model predictor names and corresponding coefficients in the `creditscorecard` object. Using name-value pair arguments, you can select which Generalized Linear Model to fit the data. `fitmodel` returns an updated `creditscorecard` object and a `GeneralizedLinearModel` object containing the fitted model.```

## Examples

collapse all

Create a `creditscorecard` object using the `CreditCardData.mat` file to load the `data` (using a dataset from Refaat 2011).

```load CreditCardData sc = creditscorecard(data,'IDVar','CustID')```
```sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table] ```

Perform automatic binning.

`sc = autobinning(sc)`
```sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table] ```

Use `fitmodel` to fit a logistic regression model using Weight of Evidence (WOE) data. `fitmodel` internally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process. `fitmodel` then fits a logistic regression model using a stepwise method (by default).

`sc = fitmodel(sc);`
```1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1441.994, Chi2Stat = 5.3511754, PValue = 0.020708306 6. Adding ResStatus, Deviance = 1437.8756, Chi2Stat = 4.118404, PValue = 0.042419078 7. Adding OtherCC, Deviance = 1433.707, Chi2Stat = 4.1686018, PValue = 0.041179769 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70239 0.064001 10.975 5.0538e-28 CustAge 0.60833 0.24932 2.44 0.014687 ResStatus 1.377 0.65272 2.1097 0.034888 EmpStatus 0.88565 0.293 3.0227 0.0025055 CustIncome 0.70164 0.21844 3.2121 0.0013179 TmWBank 1.1074 0.23271 4.7589 1.9464e-06 OtherCC 1.0883 0.52912 2.0569 0.039696 AMBalance 1.045 0.32214 3.2439 0.0011792 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 89.7, p-value = 1.4e-16 ```

Use the `CreditCardData.mat` file to load the data (`dataWeights`) that contains a column (`RowWeights`) for the weights (using a dataset from Refaat 2011).

`load CreditCardData`

Create a `creditscorecard` object using the optional name-value pair argument for `'WeightsVar'`.

`sc = creditscorecard(dataWeights,'IDVar','CustID','WeightsVar','RowWeights')`
```sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: 'RowWeights' VarNames: {1x12 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x12 table] ```

Perform automatic binning.

`sc = autobinning(sc)`
```sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: 'RowWeights' VarNames: {1x12 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x12 table] ```

Use `fitmodel` to fit a logistic regression model using Weight of Evidence (WOE) data. `fitmodel` internally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process. `fitmodel` then fits a logistic regression model using a stepwise method (by default). When the optional name-value pair argument `'WeightsVar'` is used to specify observation (sample) weights, the `mdl` output uses the weighted counts with `stepwiseglm` and `fitglm`.

`[sc,mdl] = fitmodel(sc);`
```1. Adding CustIncome, Deviance = 764.3187, Chi2Stat = 15.81927, PValue = 6.968927e-05 2. Adding TmWBank, Deviance = 751.0215, Chi2Stat = 13.29726, PValue = 0.0002657942 3. Adding AMBalance, Deviance = 743.7581, Chi2Stat = 7.263384, PValue = 0.007037455 Generalized linear regression model: logit(status) ~ 1 + CustIncome + TmWBank + AMBalance Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70642 0.088702 7.964 1.6653e-15 CustIncome 1.0268 0.25758 3.9862 6.7132e-05 TmWBank 1.0973 0.31294 3.5063 0.0004543 AMBalance 1.0039 0.37576 2.6717 0.0075464 1200 observations, 1196 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 36.4, p-value = 6.22e-08 ```

Create a `creditscorecard` object using the `CreditCardData.mat` file to load the `data` (using a dataset from Refaat 2011).

```load CreditCardData sc = creditscorecard(data,'IDVar','CustID')```
```sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table] ```

Perform automatic binning.

`sc = autobinning(sc,'Algorithm','EqualFrequency')`
```sc = creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 0 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table] ```

Use `fitmodel` to fit a logistic regression model using Weight of Evidence (WOE) data. `fitmodel` internally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process. Set the `VariableSelection` name-value pair argument to `FullModel` to specify that all predictors must be included in the fitted logistic regression model.

`sc = fitmodel(sc,'VariableSelection','FullModel');`
```Generalized linear regression model: status ~ [Linear formula with 10 terms in 9 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ _______ _________ (Intercept) 0.70262 0.063862 11.002 3.734e-28 CustAge 0.57683 0.27064 2.1313 0.033062 TmAtAddress 1.0653 0.55233 1.9287 0.053762 ResStatus 1.4189 0.65162 2.1775 0.029441 EmpStatus 0.89916 0.29217 3.0776 0.002087 CustIncome 0.77506 0.21942 3.5323 0.0004119 TmWBank 1.0826 0.26583 4.0727 4.648e-05 OtherCC 1.1354 0.52827 2.1493 0.031612 AMBalance 0.99315 0.32642 3.0425 0.0023459 UtilRate 0.16723 0.55745 0.29999 0.76419 1200 observations, 1190 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 85.6, p-value = 1.25e-14 ```

Create a `creditscorecard` object using the `CreditCardData.mat` file to load the `dataMissing` with missing values.

```load CreditCardData.mat head(dataMissing,5)```
```ans=5×11 table CustID CustAge TmAtAddress ResStatus EmpStatus CustIncome TmWBank OtherCC AMBalance UtilRate status ______ _______ ___________ ___________ _________ __________ _______ _______ _________ ________ ______ 1 53 62 <undefined> Unknown 50000 55 Yes 1055.9 0.22 0 2 61 22 Home Owner Employed 52000 25 Yes 1161.6 0.24 0 3 47 30 Tenant Employed 37000 61 No 877.23 0.29 0 4 NaN 75 Home Owner Employed 53000 20 Yes 157.37 0.08 0 5 68 56 Home Owner Employed 53000 14 Yes 561.84 0.11 0 ```
`fprintf('Number of rows: %d\n',height(dataMissing))`
```Number of rows: 1200 ```
`fprintf('Number of missing values CustAge: %d\n',sum(ismissing(dataMissing.CustAge)))`
```Number of missing values CustAge: 30 ```
`fprintf('Number of missing values ResStatus: %d\n',sum(ismissing(dataMissing.ResStatus)))`
```Number of missing values ResStatus: 40 ```

Use `creditscorecard` with the name-value argument `'BinMissingData'` set to `true` to bin the missing numeric or categorical data in a separate bin.

```sc = creditscorecard(dataMissing,'IDVar','CustID','BinMissingData',true); sc = autobinning(sc); disp(sc)```
``` creditscorecard with properties: GoodLabel: 0 ResponseVar: 'status' WeightsVar: '' VarNames: {1x11 cell} NumericPredictors: {1x6 cell} CategoricalPredictors: {'ResStatus' 'EmpStatus' 'OtherCC'} BinMissingData: 1 IDVar: 'CustID' PredictorVars: {1x9 cell} Data: [1200x11 table] ```

Display and plot bin information for numeric data for `'CustAge'` that includes missing data in a separate bin labelled `<missing>`.

```[bi,cp] = bininfo(sc,'CustAge'); disp(bi)```
``` Bin Good Bad Odds WOE InfoValue ___________ ____ ___ ______ ________ __________ '[-Inf,33)' 69 52 1.3269 -0.42156 0.018993 '[33,37)' 63 45 1.4 -0.36795 0.012839 '[37,40)' 72 47 1.5319 -0.2779 0.0079824 '[40,46)' 172 89 1.9326 -0.04556 0.0004549 '[46,48)' 59 25 2.36 0.15424 0.0016199 '[48,51)' 99 41 2.4146 0.17713 0.0035449 '[51,58)' 157 62 2.5323 0.22469 0.0088407 '[58,Inf]' 93 25 3.72 0.60931 0.032198 '<missing>' 19 11 1.7273 -0.15787 0.00063885 'Totals' 803 397 2.0227 NaN 0.087112 ```
`plotbins(sc,'CustAge')` Display and plot bin information for categorical data for `'ResStatus'` that includes missing data in a separate bin labelled `<missing>`.

```[bi,cg] = bininfo(sc,'ResStatus'); disp(bi)```
``` Bin Good Bad Odds WOE InfoValue ____________ ____ ___ ______ _________ __________ 'Tenant' 296 161 1.8385 -0.095463 0.0035249 'Home Owner' 352 171 2.0585 0.017549 0.00013382 'Other' 128 52 2.4615 0.19637 0.0055808 '<missing>' 27 13 2.0769 0.026469 2.3248e-05 'Totals' 803 397 2.0227 NaN 0.0092627 ```
`plotbins(sc,'ResStatus')` Use `fitmodel` to fit a logistic regression model using Weight of Evidence (WOE) data. `fitmodel` internally transforms all the predictor variables into WOE values, using the bins found with the automatic binning process. `fitmodel` then fits a logistic regression model using a stepwise method (by default). For predictors that have missing data, there is an explicit `<missing>` bin, with a corresponding WOE value computed from the data. When using `fitmodel`, the corresponding WOE value for the <missing> bin is applied when performing the WOE transformation. For example, a missing value for customer age (`CustAge`) is replaced with `-0.15787` which is the WOE value for the `<missing>` bin for the `CustAge` predictor. However when `'BinMissingData'` is false, a missing value for `CustAge` remains as missing (`NaN`) when applying the WOE transformation.

`[sc,mdl] = fitmodel(sc);`
```1. Adding CustIncome, Deviance = 1490.8527, Chi2Stat = 32.588614, PValue = 1.1387992e-08 2. Adding TmWBank, Deviance = 1467.1415, Chi2Stat = 23.711203, PValue = 1.1192909e-06 3. Adding AMBalance, Deviance = 1455.5715, Chi2Stat = 11.569967, PValue = 0.00067025601 4. Adding EmpStatus, Deviance = 1447.3451, Chi2Stat = 8.2264038, PValue = 0.0041285257 5. Adding CustAge, Deviance = 1442.8477, Chi2Stat = 4.4974731, PValue = 0.033944979 6. Adding ResStatus, Deviance = 1438.9783, Chi2Stat = 3.86941, PValue = 0.049173805 7. Adding OtherCC, Deviance = 1434.9751, Chi2Stat = 4.0031966, PValue = 0.045414057 Generalized linear regression model: status ~ [Linear formula with 8 terms in 7 predictors] Distribution = Binomial Estimated Coefficients: Estimate SE tStat pValue ________ ________ ______ __________ (Intercept) 0.70229 0.063959 10.98 4.7498e-28 CustAge 0.57421 0.25708 2.2335 0.025513 ResStatus 1.3629 0.66952 2.0356 0.04179 EmpStatus 0.88373 0.2929 3.0172 0.002551 CustIncome 0.73535 0.2159 3.406 0.00065929 TmWBank 1.1065 0.23267 4.7556 1.9783e-06 OtherCC 1.0648 0.52826 2.0156 0.043841 AMBalance 1.0446 0.32197 3.2443 0.0011775 1200 observations, 1192 error degrees of freedom Dispersion: 1 Chi^2-statistic vs. constant model: 88.5, p-value = 2.55e-16 ```

## Input Arguments

collapse all

Credit scorecard model, specified as a `creditscorecard` object. Use `creditscorecard` to create a `creditscorecard` object.

### Name-Value Pair Arguments

Specify optional comma-separated pairs of `Name,Value` arguments. `Name` is the argument name and `Value` is the corresponding value. `Name` must appear inside quotes. You can specify several name and value pair arguments in any order as `Name1,Value1,...,NameN,ValueN`.

Example: ```[sc,mdl] = fitmodel(sc,'VariableSelection','FullModel')```

Predictor variables for fitting the `creditscorecard` object, specified as the comma-separated pair consisting of `'PredictorVars'` and a cell array of character vectors. When provided, the `creditscorecard` object property `PredictorsVars` is updated. Note that the order of predictors in the original dataset is enforced, regardless of the order in which `'PredictorVars'` is provided. When not provided, the predictors used to create the `creditscorecard` object (by using `creditscorecard`) are used.

Data Types: `cell`

The variable selection method to fit the logistic regression model, specified as the comma-separated pair consisting of `'VariableSelection'` and a character vector with values `'Stepwise'` or `'FullModel'`:

• `FullModel` — Fits a model with all predictor variables in the `PredictorVars` name-value pair argument and calls `fitglm`.

### Note

Only variables in the `PredictorVars` property of the `creditscorecard` object can potentially become part of the logistic regression model and only linear terms are included in this model with no interactions or any other higher-order terms.

The response variable is mapped so that “Good” is `1` and “Bad” is `0`.

Data Types: `char`

Initial model for the `Stepwise` variable selection method, specified as the comma-separated pair consisting of `'StartingModel'` and a character vector with values `'Constant'` or `'Linear'`. This option determines the initial model (constant or linear) that the Statistics and Machine Learning Toolbox function `stepwiseglm` starts with.

• `Constant` — Starts the stepwise method with an empty (constant only) model.

• `Linear` — Starts the stepwise method from a full (all predictors in) model.

### Note

`StartingModel` is used only for the `Stepwise` option of `VariableSelection` and has no effect for the `FullModel` option of `VariableSelection`.

Data Types: `char`

Indicator to display model information at command line, specified as the comma-separated pair consisting of `'Display'` and a character vector with value `'On'` or `'Off'`.

Data Types: `char`

## Output Arguments

collapse all

Credit scorecard model, returned as an updated `creditscorecard` object. The `creditscorecard` object contains information about the model predictors and coefficients used to fit the WOE data. For more information on using the `creditscorecard` object, see `creditscorecard`.

Fitted logistic model, retuned as an object of type `GeneralizedLinearModel` containing the fitted model. For more information on a `GeneralizedLinearModel` object, see `GeneralizedLinearModel`.

### Note

When creating the `creditscorecard` object with `creditscorecard`, if the optional name-value pair argument `WeightsVar` was used to specify observation (sample) weights, then `mdl` uses the weighted counts with `stepwiseglm` and `fitglm`.

## More About

collapse all

### Using `fitmodel` with Weights

When observation weights are provided in the credit scorecard `data`, the weights are used to calibrate the model coefficients.

The underlying Statistics and Machine Learning Toolboxfunctionality for `stepwiseglm` and `fitglm` supports observation weights. The weights also affect the logistic model through the WOE values. The WOE transformation is applied to all predictors before fitting the logistic model. The observation weights directly impact the WOE values. For more information, see Using bininfo with Weights and Credit Scorecard Modeling Using Observation Weights.

Therefore, the credit scorecard points and final score depend on the observation weights through both the logistic model coefficients and the WOE values.

### Models

A logistic regression model is used in the `creditscorecard` object.

For the model, the probability of being “Bad” is given by `ProbBad = exp(-s) / (1 + exp(-s))`.

 Anderson, R. The Credit Scoring Toolkit. Oxford University Press, 2007.

 Refaat, M. Credit Risk Scorecards: Development and Implementation Using SAS. lulu.com, 2011.

Download ebook