Tobit

Create Tobit model object for exposure at default

Since R2021b

expand all in page

Description

Create and analyze a Tobit model object to calculate the exposure at default (EAD) using this workflow:

Use fitEADModel to create a Tobit model object.
Use predict to predict the EAD.
Use modelDiscrimination to return AUROC and ROC data. You can plot the results using modelDiscriminationPlot.
Use modelCalibration to return the R-squared, RMSE, correlation, and sample mean error of predicted and observed EAD data. You can plot the results using modelCalibrationPlot.

Creation

Syntax

TobitEADModel = fitEADModel(data,ModelType)

TobitEADModel = fitEADModel(___,Name=Value)

Description

TobitEADModel = fitEADModel(data,ModelType) creates a Tobit EAD model object.

example

TobitEADModel = fitEADModel(___,Name=Value) specifies options using one or more name-value arguments in addition to the input arguments in the previous syntax. The optional name-value arguments set the model object properties. For example, eadModel = fitEADModel(EADData,ModelType,PredictorVars={'UtilizationRate','Age','Marriage'},ConversionMeasure="ccf",DrawnVar='Drawn',LimitVar='Limit',ResponseVar='EAD') creates an eadModel object using a Tobit model type.

example

Input Arguments

expand all

`data` — Data for exposure at default
table

Data for exposure at default, specified as a table.

Data Types: table

`ModelType` — Model type
string with value `"Tobit"` | character vector with value `'Tobit'`

Model type, specified as a string with the value of "Tobit" or a character vector with the value of 'Tobit'.

Data Types: char | string

Name-Value Arguments

expand all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: eadModel = fitEADModel(EADData,ModelType,PredictorVars={'UtilizationRate','Age','Marriage'},ConversionMeasure="ccf",DrawnVar='Drawn',LimitVar='Limit',ResponseVar='EAD')

`ModelID` — User-defined model ID
`"Tobit"` (default) | string | character vector

User-defined model ID, specified as ModelID and a string or character vector. The software uses the ModelID text to format outputs and is expected to be short.

Data Types: string | char

`Description` — User-defined description for model
`""` (default) | string | character vector

User-defined description for model, specified as Description and a string or character vector.

Data Types: string | char

`PredictorVars` — Predictor variables
all columns of `data` except for `ResponseVar` (default) | string array | cell array of character vectors

Predictor variables, specified as PredictorVars and a string array or cell array of character vectors. PredictorVars indicates which columns in the data input contain the predictor information. By default, PredictorVars is set to all the columns in the data input except for ResponseVar.

Data Types: string | cell

`ResponseVar` — Response variable
last column of `data` (default) | string | character vector

Response variable, specified as ResponseVar and a string or character vector. The response variable contains the EAD data and must be a numeric variable. By default, ResponseVar is set to the last column.

Data Types: string | char

`LimitVar` — Limit variable
string | character vector

Limit variable, specified as LimitVar and a string or character vector. LimitVar indicates which column in data contains the limit amount. The limit amount value in the data must be a positive numeric value. The limit depends on the loan. If its a credit card, the limit is the credit limit, and if this is a mortgage limit it is the initial loan amount. In general, LimitVar is the maximum amount that can be borrowed.

Note

LimitVar is required when ConversionMeasure is 'ccf' or 'lcf'. For more information on CCF and LCF, see Conversion Measure Options.

Data Types: string | char

`DrawnVar` — Drawn variable
string | character vector

Drawn variable, specified as DrawnVar and a string or character vector. DrawnVar is the balance on the account at the time of observation, prior to default and EAD is the balance at the time of default. DrawnVar indicates which column in data contains the drawn amount. The drawn variable value in the data can be a positive or negative numeric value.

Note

DrawnVar is required when ConversionMeasure is 'ccf'.

If the ConversionMeasure is 'lcf', DrawnVar is not required. In this case, DrawnVar is set to "".

For more information on CCF, see Conversion Measure Options.

Data Types: string | char

`ConversionMeasure` — Conversion measure for EAD response values
`"ccf"` (default) | character vector with value of `'ccf'` or `'lcf'` | string with value of `"ccf"` or `"lcf"`

Response transform, specified as ConversionMeasure and a character vector or string.

"ccf" — Credit conversion factor (CCF) is the portion of the undrawn amount that will be converted into credit. The undrawn amount is the limit minus the drawn amount. The EAD thus becomes the drawn amount plus the CCF times the limit minus the drawn amount (EAD = Drawn + CCF*(Limit - Drawn)).

Note
A Tobit model with "ccf" can be unstable.
"lcf" — Limit conversion factor (LCF) is a fraction of the limit representing the total exposure. The EAD is then defined as the LCF times the limit (EAD = LCF*Limit).

For more information on CCF and LCF, see Conversion Measure Options.

Data Types: string | char

`CensoringSide` — Censoring side
`"both"` (default) | character vector with value of `'left'`, `'right'`, or `'both'` | string with value of `"left"`, `"right"`, or `"both"`

Censoring side, specified as CensoringSide and a character vector or string. CensoringSide indicates whether the desired Tobit model is left-censored, right-censored, or censored on both sides.

Data Types: string | char

`LeftLimit` — Left-censoring limit
`0` (default) | numeric between `0` and `1`

Left-censoring limit, specified as LeftLimit and a scalar numeric between 0 and 1.

Data Types: double

`RightLimit` — Right-censoring limit
`1` (default) | numeric between `0` and `1`

Right-censoring limit, specified as RightLimit and a scalar numeric between 0 and 1.

Data Types: double

`SolverOptions` — `optimoptions` object
object

Options for fitting, specified as SolverOptions and an optimoptions object that is created using optimoptions from Optimization Toolbox™. The defaults for the optimoptions object are:

"Display" — "none"
"Algorithm" — "sqp"
"MaxFunctionEvaluations" — 500 ⨉ Number of model coefficients
"MaxIterations" — The number of Tobit model coefficients is determined at run time; it depends on the number of predictors and the number of categories in the categorical predictors.

Note

When using optimoptions with a Tobit model, specify the SolverName as fmincon.

Data Types: object

Properties

expand all

`ModelID` — User-defined model ID
`Tobit` (default) | string

User-defined model ID, returned as a string.

Data Types: string

`Description` — User-defined description
`""` (default) | string

User-defined description, returned as a string.

Data Types: string

`UnderlyingModel` — Underlying statistical model
Read-only: compact linear model

This property is read-only.

Underlying statistical model, returned as a compact linear model object. The compact version of the underlying regression model is an instance of the classreg.regr.CompactLinearModel class. For more information, see fitlm and CompactLinearModel.

Data Types: CompactLinearModel

`PredictorVars` — Predictor variables
all columns of `data` except for the `ResponseVar` (default) | string array

Predictor variables, returned as a string array.

Data Types: string

`ResponseVar` — Response variable
last column of `data` (default) | string

Response variable, returned as a string.

Data Types: string

`LimitVar` — Limit variable
string

Limit variable, returned as a string.

Data Types: string

`DrawnVar` — Drawn variable
string

Drawn variable, returned as a string.

Data Types: string

`ConversionMeasure` — Conversion measure for EAD response values
`"ccf"` (default) | string with value of `"ccf"` or `"lcf"`

Response transform, returned as a string.

Data Types: string

`CensoringSide` — Censoring side
Read-only: `"both"` (default) | string with value of `"left"`, `"right"`, or `"both"`

This property is read-only.

Censoring side, returned as a string.

Data Types: string

`LeftLimit` — Left-censoring limit
Read-only: `0` (default) | numeric between `0` and `1`

This property is read-only.

Left-censoring limit, returned as a scalar numeric between 0 and 1.

Data Types: double

`RightLimit` — Right-censoring limit
Read-only: `1` (default) | numeric between `0` and `1`

This property is read-only.

Right-censoring limit, returned as a scalar numeric between 0 and 1.

Data Types: double

Object Functions

`predict`	Predict exposure at default
`modelDiscrimination`	Compute AUROC and ROC data
`modelDiscriminationPlot`	Plot ROC curve
`modelCalibration`	Compute R-square, RMSE, correlation, and sample mean error of predicted and observed EADs
`modelCalibrationPlot`	Scatter plot of predicted and observed EADs

Examples

collapse all

Create Tobit EAD Model

Open Live Script

This example shows how to use fitEADModel to create a Tobit model for exposure at default (EAD).

Load EAD Data

Load the EAD data.

load EADData.mat
head(EADData)

    UtilizationRate    Age     Marriage        Limit         Drawn          EAD    
    _______________    ___    ___________    __________    __________    __________

        0.24359        25     not married         44776         10907         44740
        0.96946        44     not married    2.1405e+05    2.0751e+05         40678
              0        40     married        1.6581e+05             0    1.6567e+05
        0.53242        38     not married    1.7375e+05         92506        1593.5
         0.2583        30     not married         26258        6782.5        54.175
        0.17039        54     married        1.7357e+05         29575        576.69
        0.18586        27     not married         19590          3641        998.49
        0.85372        42     not married    2.0712e+05    1.7682e+05    1.6454e+05

rng('default');
NumObs = height(EADData);
c = cvpartition(NumObs,'HoldOut',0.4);
TrainingInd = training(c);
TestInd = test(c);

Select Model Type

Select a model type for Tobit or Regression.

ModelType = "Tobit";

Select Conversion Measure

Select a conversion measure for the EAD response values.

ConversionMeasure = "LCF";

Create Tobit EAD Model

Use fitEADModel to create a Tobit model using the EADData.

eadModel = fitEADModel(EADData,ModelType,PredictorVars={'UtilizationRate','Age','Marriage'}, ...
    ConversionMeasure=ConversionMeasure,DrawnVar="Drawn",LimitVar="Limit",ResponseVar="EAD");
disp(eadModel);

  Tobit with properties:

        CensoringSide: "both"
            LeftLimit: 0
           RightLimit: 1
              ModelID: "Tobit"
          Description: ""
      UnderlyingModel: [1×1 risk.internal.credit.TobitModel]
        PredictorVars: ["UtilizationRate"    "Age"    "Marriage"]
          ResponseVar: "EAD"
             LimitVar: "Limit"
             DrawnVar: "Drawn"
    ConversionMeasure: "lcf"

Display the underlying model. The underlying model's response variable is the transformation of the EAD response data. Use the 'LimitVar' and 'DrawnVar' name-value arguments to modify the transformation.

disp(eadModel.UnderlyingModel);

Tobit regression model:
     EAD_lcf = max(0,min(Y*,1))
     Y* ~ 1 + UtilizationRate + Age + Marriage

Estimated coefficients:
                             Estimate         SE         tStat       pValue 
                            __________    __________    ________    ________

    (Intercept)                0.22735      0.026076      8.7186           0
    UtilizationRate            0.47364      0.016501      28.703           0
    Age                     -0.0013929    0.00063179     -2.2048    0.027522
    Marriage_not married    -0.0068879      0.012273    -0.56124     0.57466
    (Sigma)                    0.36419     0.0038855      93.731           0

Number of observations: 4378
Number of left-censored observations: 0
Number of uncensored observations: 4377
Number of right-censored observations: 1
Log-likelihood: -1791.06

Predict EAD

EAD prediction operates on the underlying compact statistical model and then transforms the predicted values back to the EAD scale. You can specify the predict function with different options for the 'ModelLevel' name-vale argument.

predictedEAD = predict(eadModel,EADData(TestInd,:),ModelLevel="ead");
predictedConversion = predict(eadModel,EADData(TestInd,:),ModelLevel="ConversionMeasure");

Validate EAD Model

For model validation, use modelDiscrimination, modelDiscriminationPlot, modelCalibration, and modelCalibrationPlot.

Use modelDiscrimination and then modelDiscriminationPlot to plot the ROC curve.

ModelLevel = "ConversionMeasure";

[DiscMeasure1,DiscData1] = modelDiscrimination(eadModel,EADData(TestInd,:),ModelLevel=ModelLevel);
modelDiscriminationPlot(eadModel,EADData(TestInd, :),ModelLevel=ModelLevel,SegmentBy="Marriage");

Figure contains an axes object. The axes object with title EAD_lcf ROC Segmented by Marriage, xlabel False Positive Rate, ylabel True Positive Rate contains 2 objects of type line. These objects represent Tobit, married, AUROC = 0.70789, Tobit, not married, AUROC = 0.70898.

Use modelCalibration and then modelCalibrationPlot to show a scatter plot of the predictions.

YData = "Observed";

[CalMeasure1,CalData1] = modelCalibration(eadModel,EADData(TestInd,:),ModelLevel=ModelLevel);
modelCalibrationPlot(eadModel,EADData(TestInd,:),ModelLevel=ModelLevel,YData=YData);

Figure contains an axes object. The axes object with title Scatter Tobit, R-Squared: 0.16231, xlabel EAD_lcf Predicted, ylabel EAD_lcf Observed contains 2 objects of type scatter, line. These objects represent Data, Fit.

Plot a histogram of observed with respect to the predicted EAD.

figure;
histogram(CalData1.Observed);
hold on;
histogram(CalData1.(('Predicted_' + ModelType)));
legend('Observed','Predicted');

Figure contains an axes object. The axes object contains 2 objects of type histogram. These objects represent Observed, Predicted.

More About

expand all

Exposure at Default Tobit Models

The exposure at default (EAD) Tobit models fit a Tobit model to EAD data.

Tobit models are "censored" regression models. Tobit models assume that the response variable can be observed only within certain limits, and no value outside the limits can be observed. Using ModelLevel, you can set the Tobit model level to EAD, CCF, or LCF conversion measures. The EAD model level does not have any range, the CCF conversion measure has a range of -Inf to 1, and the LCF conversion measure is 0 to 1. A distribution of response values where there is a high frequency of observations at the limits is consistent with the model assumptions.

The Tobit model combines the following two formulas:

$\begin{array}{l} Y = \min {\max {L, Y^{*}}, R} \\ Y^{*} = β_{0} + β_{1} X_{1} + ... + β_{p} X_{p} + σ ε = X β + σ ε \end{array}$

where

Y is the observed response variable, the observed EAD data for an EAD model.
L is the left limit, the lower bound for the response values, typically 0 for EAD models.
R is the right limit, the upper bound for the response values, typically 1 for EAD models.
Y^* is a latent, unobserved variable.
β_j is the coefficient of the jth predictor (or the intercept for j = 0).
σ is the standard deviation of the error term.
ϵ is the error term, assumed to follow a standard normal distribution.

The first formula above is written using min and max operators and is equivalent to

$Y = {\begin{cases} L if Y^{*} \leq L \\ Y^{*} if L < Y^{*} < R \\ R if Y^{*} \geq R \end{cases}}$

The standard deviation of the error is explicitly indicated in the formulas. Unlike traditional regression least-squares estimation, where the standard deviation of the error can be inferred from the residuals, for Tobit models the estimation is via maximum likelihood and the standard deviation needs to be handled explicitly during the estimation. If there are p predictor variables, the Tobit model estimates p+2 coefficients, namely, one coefficient for each predictor, plus an intercept, plus a standard deviation.

Three censoring side options are supported in the Tobit EAD models with the CensoringSide name-value argument:

'both' — This option is the default option, with censoring on both sides. The estimation uses left and right limits.
'left' — The left-censored version of the model has no right limit (or R = ∞). The relationship between Y and Y^* is Y = maxâ¡{L,Y^* }.
'right' — The right-censored version of the model has no left limit (or L = -∞). The relationship between Y and Y^* is Y = min{Y^*,R}.

The parameters of the Tobit model are estimated using maximum likelihood. For observation i = 1,...,n, the likelihood function is

$L F (β, σ | X_{i}, Y_{i}) = {\begin{cases} Φ (L; X_{i} β, σ) if Y_{i} \leq L \\ ϕ (Y_{i} {;X}_{i} β, σ) if L < Y_{i} < R \\ 1 - Φ (R; X_{i} β, σ) if Y_{i} \geq R \end{cases}}$

where

Φ(x;m,s) is the cumulative normal distribution with mean m and standard deviation s.
φ(x;m,s) is the normal density function with mean m and standard deviation s.

This likelihood function is for models censored on both sides. For left-censored models, the right limit has no effect, and the likelihood function has two cases only (R = ∞); likewise for right-censored models (L = -∞).

The log-likelihood function is the sum of the logarithm of the likelihood functions for individual observations

$L L F (β, σ | X, Y) = \sum_{i = 1}^{n} \log (L F (β, σ | X_{i}, Y_{i}))$

The parameters are estimated by maximizing the log-likelihood function. The only constraint is that the Ïƒ parameter must be positive.

To predict an EAD value, Tobit EAD models return the unconditional expected value of the response, given the predictor values

$E A D_{i}^{p r e d} = E [Y_{i} | X_{i}]$

The expression for the expected value can be separated into the cases

$\begin{array}{l} E [Y] = E [Y | Y = L] P (Y = L) \\ + E [Y | L < Y < R] P (L < Y < R) \\ + E [Y | Y = R] P (Y = R) \end{array}$

Using the previous expression and the properties of the (truncated) normal distribution, it follows that

$E [Y_{i} | X_{i}] = Φ (a_{i}) L + (Φ (b_{i}) - Φ (a_{i})) (X_{i} β + σ λ_{i}) + (1 - Φ (b_{i})) R$

where

$a_{i} = \frac{L - X_{i} β}{σ}, b_{i} = \frac{R - X_{i} β}{σ}, and λ_{i} = \frac{ϕ (a_{i}) - ϕ (b_{i})}{Φ (b_{i}) - Φ (a_{i})}$

This expression applies to the models censored on both sides. For models censored on one side only, the corresponding expressions can be derived from here. For example, for left-censored models, let the R limit in the expression above go to infinity, and the resulting expression is

$E [Y_{i} | X_{i}] = Φ (a_{i}) L + (1 - Φ (a_{i})) (X_{i} β + σ \frac{ϕ (a_{i})}{1 - Φ (a_{i})})$

Similarly, for right-censored models, the L limit is decreased to minus infinity to get

$E [Y_{i} | X_{i}] = Φ (b_{i}) (X_{i} β - σ \frac{ϕ (b_{i})}{Φ (b_{i})}) + (1 - Φ (b_{i})) R$

Conversion Measure Options

You can relate the EAD to a scaling variable and derive conversion measures like credit conversion factor (CCF) and limit conversion factor (LCF) using the 'ccf' or 'lcf' options for the ConversionMeasure name-value argument.

The following table summarizes the supported transformations using the 'ccf' or 'lcf' options for the ConversionMeasure name-value argument:

Measure	EAD Formula	Lower Bound	Upper Bound	Inverse Transformation
CCF	`EAD = Drawn + CCF Ã— (Limit - Drawn)`	-Inf	1	`CCF = 1 - e^{(- CCF_t)}`
LCF	`EAD = LCF ⨉ Limit`	0	1	`LCF = e^LCF_t âˆ• (1 + e^LCF_t)`

References

[1] Baesens, Bart, Daniel Roesch, and Harald Scheule. Credit Risk Analytics: Measurement Techniques, Applications, and Examples in SAS. Wiley, 2016.

[2] Bellini, Tiziano. IFRS 9 and CECL Credit Risk Modelling and Validation: A Practical Guide with Examples Worked in R and SAS. San Diego, CA: Elsevier, 2019.

[3] Brown, Iain. Developing Credit Risk Models Using SAS Enterprise Miner and SAS/STAT: Theory and Applications. SAS Institute, 2014.

[4] Roesch, Daniel and Harald Scheule. Deep Credit Risk. Independently published, 2020.

Version History

Introduced in R2021b

expand all

R2023a: `modelAccuracy` object function is renamed to `modelCalibration` function

The modelAccuracy object function is renamed to modelCalibration function. The use of modelAccuracy is discouraged, use modelCalibration instead.

R2023a: `modelAccuracyPlot` object function is renamed to `modelCalibrationPlot` function

The modelAccuracyPlot object function is renamed to modelCalibrationPlot function. The use of modelAccuracyPlot is discouraged, use modelCalibrationPlot instead.

Tobit

Description

Creation

Syntax

Description

Input Arguments

data — Data for exposure at default table

ModelType — Model type string with value "Tobit" | character vector with value 'Tobit'

Name-Value Arguments

ModelID — User-defined model ID "Tobit" (default) | string | character vector

Description — User-defined description for model "" (default) | string | character vector

PredictorVars — Predictor variables all columns of data except for ResponseVar (default) | string array | cell array of character vectors

ResponseVar — Response variable last column of data (default) | string | character vector

LimitVar — Limit variable string | character vector

DrawnVar — Drawn variable string | character vector

ConversionMeasure — Conversion measure for EAD response values "ccf" (default) | character vector with value of 'ccf' or 'lcf' | string with value of "ccf" or "lcf"

CensoringSide — Censoring side "both" (default) | character vector with value of 'left', 'right', or 'both' | string with value of "left", "right", or "both"

LeftLimit — Left-censoring limit 0 (default) | numeric between 0 and 1

RightLimit — Right-censoring limit 1 (default) | numeric between 0 and 1

SolverOptions — optimoptions object object

Properties

ModelID — User-defined model ID Tobit (default) | string

Description — User-defined description "" (default) | string

UnderlyingModel — Underlying statistical model Read-only: compact linear model

PredictorVars — Predictor variables all columns of data except for the ResponseVar (default) | string array

ResponseVar — Response variable last column of data (default) | string

LimitVar — Limit variable string

DrawnVar — Drawn variable string

ConversionMeasure — Conversion measure for EAD response values "ccf" (default) | string with value of "ccf" or "lcf"

CensoringSide — Censoring side Read-only: "both" (default) | string with value of "left", "right", or "both"

LeftLimit — Left-censoring limit Read-only: 0 (default) | numeric between 0 and 1

RightLimit — Right-censoring limit Read-only: 1 (default) | numeric between 0 and 1

Object Functions

Examples

Create Tobit EAD Model

More About

Exposure at Default Tobit Models

Conversion Measure Options

References

Version History

R2023a: modelAccuracy object function is renamed to modelCalibration function

R2023a: modelAccuracyPlot object function is renamed to modelCalibrationPlot function

See Also

Functions

Topics

`data` — Data for exposure at default
table

`ModelType` — Model type
string with value `"Tobit"` | character vector with value `'Tobit'`

`ModelID` — User-defined model ID
`"Tobit"` (default) | string | character vector

`Description` — User-defined description for model
`""` (default) | string | character vector

`PredictorVars` — Predictor variables
all columns of `data` except for `ResponseVar` (default) | string array | cell array of character vectors

`ResponseVar` — Response variable
last column of `data` (default) | string | character vector

`LimitVar` — Limit variable
string | character vector

`DrawnVar` — Drawn variable
string | character vector

`ConversionMeasure` — Conversion measure for EAD response values
`"ccf"` (default) | character vector with value of `'ccf'` or `'lcf'` | string with value of `"ccf"` or `"lcf"`

`CensoringSide` — Censoring side
`"both"` (default) | character vector with value of `'left'`, `'right'`, or `'both'` | string with value of `"left"`, `"right"`, or `"both"`

`LeftLimit` — Left-censoring limit
`0` (default) | numeric between `0` and `1`

`RightLimit` — Right-censoring limit
`1` (default) | numeric between `0` and `1`

`SolverOptions` — `optimoptions` object
object

`ModelID` — User-defined model ID
`Tobit` (default) | string

`Description` — User-defined description
`""` (default) | string

`UnderlyingModel` — Underlying statistical model
Read-only: compact linear model

`PredictorVars` — Predictor variables
all columns of `data` except for the `ResponseVar` (default) | string array

`ResponseVar` — Response variable
last column of `data` (default) | string

`LimitVar` — Limit variable
string

`DrawnVar` — Drawn variable
string

`ConversionMeasure` — Conversion measure for EAD response values
`"ccf"` (default) | string with value of `"ccf"` or `"lcf"`

`CensoringSide` — Censoring side
Read-only: `"both"` (default) | string with value of `"left"`, `"right"`, or `"both"`

`LeftLimit` — Left-censoring limit
Read-only: `0` (default) | numeric between `0` and `1`

`RightLimit` — Right-censoring limit
Read-only: `1` (default) | numeric between `0` and `1`

R2023a: `modelAccuracy` object function is renamed to `modelCalibration` function

R2023a: `modelAccuracyPlot` object function is renamed to `modelCalibrationPlot` function