Main Content

Generate Code to Classify Data in Table

This example shows how to generate code for classifying numeric and categorical data in a table using a binary decision tree model. The trained model in this example identifies categorical predictors in the CategoricalPredictors property; therefore, the software handles categorical predictors automatically. You do not need to create dummy variables manually for categorical predictors to generate code.

In the general code generation workflow, you can train a classification or regression model on data in a table. You pass arrays (instead of a table) to your entry-point function for prediction, create a table inside the entry-point function, and then pass the table to predict. For more information on table support in code generation, see Code Generation for Tables (MATLAB Coder) and Table Limitations for Code Generation (MATLAB Coder).

Train Classification Model

Load the patients data set. Create a table that contains numeric predictors of type single and double, categorical predictors of type categorical, and the response variable Smoker of type logical. Each row of the table corresponds to a different patient.

load patients
Age = single(Age);
Weight = single(Weight);
Gender = categorical(Gender);
SelfAssessedHealthStatus = categorical(SelfAssessedHealthStatus);
Tbl = table(Age,Diastolic,Systolic,Weight,Gender,SelfAssessedHealthStatus,Smoker);

Train a classification tree using the data in Tbl.

Mdl = fitctree(Tbl,'Smoker')
Mdl = 
           PredictorNames: {'Age'  'Diastolic'  'Systolic'  'Weight'  'Gender'  'SelfAssessedHealthStatus'}
             ResponseName: 'Smoker'
    CategoricalPredictors: [5 6]
               ClassNames: [0 1]
           ScoreTransform: 'none'
          NumObservations: 100

The CategoricalPredictors property value is [5 6], which indicates that Mdl identifies the 5th and 6th predictors ('Gender' and 'SelfAssessedHealthStatus') as categorical predictors. To identify any other predictors as categorical predictors, you can specify them by using the 'CategoricalPredictors' name-value argument.

Display the predictor names and their order in Mdl.

ans = 1x6 cell
    {'Age'}    {'Diastolic'}    {'Systolic'}    {'Weight'}    {'Gender'}    {'SelfAssessedHealthStatus'}

Save Model

Save the tree classifier to a file using saveLearnerForCoder.


saveLearnerForCoder saves the classifier to the MATLAB® binary file TreeModel.mat as a structure array in the current folder.

Define Entry-Point Function

Define the entry-point function predictSmoker, which takes predictor variables as input arguments. Within the function, load the tree classifier by using loadLearnerForCoder, create a table from the input arguments, and then pass the classifier and table to predict.

function [labels,scores] = predictSmoker(age,diastolic,systolic,weight,gender,selfAssessedHealthStatus) %#codegen
%PREDICTSMOKER Label new observations using a trained tree model
%   predictSmoker predicts whether patients are smokers (1) or nonsmokers
%   (0) based on their age, diastolic blood pressure, systolic blood
%   pressure, weight, gender, and self assessed health status. The function
%   also provides classification scores indicating the likelihood that a
%   predicted label comes from a particular class (smoker or nonsmoker).
mdl = loadLearnerForCoder('TreeModel');
varnames = mdl.PredictorNames;
tbl = table(age,diastolic,systolic,weight,gender,selfAssessedHealthStatus, ...
[labels,scores] = predict(mdl,tbl);

When you create a table inside an entry-point function, you must specify the variable names (for example, by using the 'VariableNames' name-value pair argument of table). If your table contains only predictor variables, and the predictors are in the same order as in the table used to train the model, then you can find the predictor variable names in mdl.PredictorNames.

Generate Code

Generate code for predictSmoker by using codegen. Specify the data type and dimensions of the predictor variable input arguments using coder.typeof.

  • The first input argument of coder.typeof specifies the data type of the predictor.

  • The second input argument specifies the upper bound on the number of rows (Inf) and columns (1) in the predictor.

  • The third input argument specifies that the number of rows in the predictor can change at run time but the number of columns is fixed.

ARGS = cell(4,1);
ARGS{1} = coder.typeof(Age,[Inf 1],[1 0]);
ARGS{2} = coder.typeof(Diastolic,[Inf 1],[1 0]);
ARGS{3} = coder.typeof(Systolic,[Inf 1],[1 0]);
ARGS{4} = coder.typeof(Weight,[Inf 1],[1 0]);
ARGS{5} = coder.typeof(Gender,[Inf 1],[1 0]);
ARGS{6} = coder.typeof(SelfAssessedHealthStatus,[Inf 1],[1 0]);

codegen predictSmoker -args ARGS
Code generation successful.

codegen generates the MEX function predictSmoker_mex with a platform-dependent extension in your current folder.

Verify Generated Code

Verify that predict, predictSmoker, and the MEX file return the same results for a random sample of 20 patients.

rng('default') % For reproducibility
[newTbl,idx] = datasample(Tbl,20);

[labels1,scores1] = predict(Mdl,newTbl);
[labels2,scores2] = predictSmoker(Age(idx),Diastolic(idx),Systolic(idx),Weight(idx),Gender(idx),SelfAssessedHealthStatus(idx));
[labels3,scores3] = predictSmoker_mex(Age(idx),Diastolic(idx),Systolic(idx),Weight(idx),Gender(idx),SelfAssessedHealthStatus(idx));

verifyMEXlabels = isequal(labels1,labels2,labels3)
verifyMEXlabels = logical

verifyMEXscores = isequal(scores1,scores2,scores3)
verifyMEXscores = logical

See Also

(MATLAB Coder) | (MATLAB Coder) | |

Related Topics