Main Content

generateLearnerDataTypeFcn

Generate function that defines data types for fixed-point code generation

Description

To generate fixed-point C/C++ code for the predict function of a machine learning model, use generateLearnerDataTypeFcn, saveLearnerForCoder, loadLearnerForCoder, and codegen (MATLAB Coder).

  • After training a machine learning model, save the model using saveLearnerForCoder.

  • Create a structure that defines fixed-point data types by using the function generated from generateLearnerDataTypeFcn.

  • Define an entry-point function that loads the model by using both loadLearnerForCoder and the structure, and then calls the predict function.

  • Generate code using codegen, and then verify the generated code.

The generateLearnerDataTypeFcn function requires Fixed-Point Designer™, and generating fixed-point C/C++ code requires MATLAB® Coder™ and Fixed-Point Designer.

This flow chart shows the fixed-point code generation workflow for the predict function of a machine learning model. Use generateLearnerDataTypeFcn for the highlighted step.

Fixed-point code generation workflow. Step 1: Train a model. Step 2: Save the model. Step 3 (highlighted): Define the fixed-point data types. Step 4: Define an entry-point function. Step 5 (optional): Optimize the fixed-point data types. Step 6: Generate code. Step 7: Verify the generated code.

generateLearnerDataTypeFcn(filename,X) generates a data type function that defines fixed-point data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model. filename stores the machine learning model, and X contains the predictor data for the predict function of the model.

Use the generated function to create a structure that defines fixed-point data types. Then, use the structure as the input argument T of loadLearnerForCoder.

example

generateLearnerDataTypeFcn(filename,X,Name,Value) specifies additional options by using one or more name-value pair arguments. For example, you can specify 'WordLength',32 to use 32-bit word length for the fixed-point data types.

Examples

collapse all

After training a machine learning model, save the model using saveLearnerForCoder. For fixed-point code generation, specify the fixed-point data types of the variables required for prediction by using the data type function generated by generateLearnerDataTypeFcn. Then, define an entry-point function that loads the model by using both loadLearnerForCoder and the specified fixed-point data types, and calls the predict function of the model. Use codegen (MATLAB Coder) to generate fixed-point C/C++ code for the entry-point function, and then verify the generated code.

Before generating code using codegen, you can use buildInstrumentedMex (Fixed-Point Designer) and showInstrumentationResults (Fixed-Point Designer) to optimize the fixed-point data types to improve the performance of the fixed-point code. Record minimum and maximum values of named and internal variables for prediction by using buildInstrumentedMex. View the instrumentation results using showInstrumentationResults; then, based on the results, tune the fixed-point data type properties of the variables. For details regarding this optional step, see Fixed-Point Code Generation for Prediction of SVM.

Train Model

Load the ionosphere data set and train a binary SVM classification model.

load ionosphere
Mdl = fitcsvm(X,Y,'KernelFunction','gaussian');

Mdl is a ClassificationSVM model.

Save Model

Save the SVM classification model to the file myMdl.mat by using saveLearnerForCoder.

saveLearnerForCoder(Mdl,'myMdl');

Define Fixed-Point Data Types

Use generateLearnerDataTypeFcn to generate a function that defines the fixed-point data types of the variables required for prediction of the SVM model.

generateLearnerDataTypeFcn('myMdl',X)

generateLearnerDataTypeFcn generates the myMdl_datatype function.

Create a structure T that defines the fixed-point data types by using myMdl_datatype.

T = myMdl_datatype('Fixed')
T = struct with fields:
               XDataType: [0x0 embedded.fi]
           ScoreDataType: [0x0 embedded.fi]
    InnerProductDataType: [0x0 embedded.fi]

The structure T includes the fields for the named and internal variables required to run the predict function. Each field contains a fixed-point object, returned by fi (Fixed-Point Designer). The fixed-point object specifies fixed-point data type properties, such as word length and fraction length. For example, display the fixed-point data type properties of the predictor data.

T.XDataType
ans = 

[]

          DataTypeMode: Fixed-point: binary point scaling
            Signedness: Signed
            WordLength: 16
        FractionLength: 14

        RoundingMethod: Floor
        OverflowAction: Wrap
           ProductMode: FullPrecision
  MaxProductWordLength: 128
               SumMode: FullPrecision
      MaxSumWordLength: 128

Define Entry-Point Function

Define an entry-point function named myFixedPointPredict that does the following:

  • Accept the predictor data X and the fixed-point data type structure T.

  • Load a fixed-point version of a trained SVM classification model by using both loadLearnerForCoder and the structure T.

  • Predict labels and scores using the loaded model.

function [label,score] = myFixedPointPredict(X,T) %#codegen
Mdl = loadLearnerForCoder('myMdl','DataType',T);
[label,score] = predict(Mdl,X);
end

Note: If you click the button located in the upper-right section of this example and open the example in MATLAB®, then MATLAB opens the example folder. This folder includes the entry-point function file.

Generate Code

The XDataType field of the structure T specifies the fixed-point data type of the predictor data. Convert X to the type specified in T.XDataType by using the cast (Fixed-Point Designer) function.

X_fx = cast(X,'like',T.XDataType);

Generate code for the entry-point function using codegen. Specify X_fx and constant folded T as input arguments of the entry-point function.

codegen myFixedPointPredict -args {X_fx,coder.Constant(T)}
Code generation successful.

codegen generates the MEX function myFixedPointPredict_mex with a platform-dependent extension.

Verify Generated Code

Pass predictor data to predict and myFixedPointPredict_mex to compare the outputs.

[labels,scores] = predict(Mdl,X);
[labels_fx,scores_fx] = myFixedPointPredict_mex(X_fx,T);

Compare the outputs from predict and myFixedPointPredict_mex.

verify_labels = isequal(labels,labels_fx)
verify_labels = logical
   1

isequal returns logical 1 (true), which means labels and labels_fx are equal.

If you are not satisfied with the comparison results and want to improve the precision of the generated code, you can tune the fixed-point data types and regenerate the code. For details, see Tips in generateLearnerDataTypeFcn, Data Type Function, and Fixed-Point Code Generation for Prediction of SVM.

Since R2023a

Create a fixed-point data type structure using the function generated by generateLearnerDataTypeFcn. Update the structure to include a lookup table that approximates the score transformation function of a trained classifier. Then, generate fixed-point code using the updated structure.

Train Model

Load the census1994 data set, which contains the variables adultdata and adulttest. These variables contain demographic data from the US Census Bureau used to predict whether an individual makes over $50,000 a year. You can use adultdata to train a model and adulttest to test the trained model.

load census1994

Consider a model that predicts the salary category of employees given their age, working class, education level, capital gain and loss, and number of working hours per week. Extract the variables of interest and save them in tables.

tbl = adultdata(:,{'age','education_num','capital_gain','capital_loss','hours_per_week'});
tblTest = adulttest(:,{'age','education_num','capital_gain','capital_loss','hours_per_week'});

Fixed-point code generation does not support tables or categorical arrays. So, define the predictor data using a numeric matrix, and define the class labels using a logical vector. A logical vector uses memory most efficiently in a binary classification problem.

X = table2array(tbl);
Y = adultdata.salary == '<=50K';

XTest = table2array(tblTest);
YTest = adulttest.salary == '<=50K';

The software uses one fixed-point data type for the predictor data. Therefore, normalizing predictor data makes the fixed-point data type more robust against overflows and underflows. Also, normalization reduces the amount of computation required to generate the lookup table for score transformation.

Normalize the predictor data X and XTest. Use the mean and standard deviation of the training data X to normalize the test data XTest.

[X,C,S] = normalize(X);
XTest = normalize(XTest,'Center',C,'Scale',S);

Train a classification tree model. Specify the minimum number of leaf node observations (MinLeafSize) and the maximum number of decision splits (MaxNumSplits) to reduce the memory footprint of the model. Specify logit for the score transformation.

Mdl = fitctree(X,Y,'Weight',adultdata.fnlwgt, ...
    'MinLeafSize',10,'MaxNumSplits',100, ...
    'ScoreTransform','logit');

Mdl is a ClassificationTree model.

Compute the classification error for the training data set and the test data set.

loss(Mdl,X,Y) 
ans = 
0.1620
loss(Mdl,XTest,YTest)
ans = 
0.1683

The classifier misclassifies approximately 16% of the training data and 17% of the test data.

Save Model

Save the classification model to the file myMdl.mat by using saveLearnerForCoder.

saveLearnerForCoder(Mdl,'myMdl');

Define Fixed-Point Data Types

Use generateLearnerDataTypeFcn to generate a function that defines the fixed-point data types of the variables required for prediction. Use all available predictor data to obtain realistic ranges for the fixed-point data types.

generateLearnerDataTypeFcn('myMdl',[X;XTest])

generateLearnerDataTypeFcn generates the myMdl_datatype function.

Create a structure T that defines the fixed-point data types by using myMdl_datatype.

T = myMdl_datatype('Fixed')
T = struct with fields:
                   XDataType: [0x0 embedded.fi]
    TransformedScoreDataType: [0x0 embedded.fi]
               ScoreDataType: [0x0 embedded.fi]

Generate Lookup Table to Approximate Score Transformation

Create a FunctionApproximation.TransformFunction (Fixed-Point Designer) object by specifying these inputs:

  • The first two inputs are the same as the inputs to the generateLearnerDataTypeFcn function. The first input is the name of the file that contains the trained model, and the second input is the predictor data.

  • The third input is the structure generated by calling the myMdl_datatype function.

approxObj = FunctionApproximation.TransformFunction('myMdl',[X;XTest],T)
approxObj = 
  TransformFunction with properties:

    Problem: [1x1 FunctionApproximation.ClassregProblem]

approxObj has the property Problem, which contains a ClassregProblem object. Display the Problem property of approxObj.

approxObj.Problem
ans = 
  ClassregProblem with properties:

    FunctionToApproximate: @(x)(1./(1+exp(-x)))
           NumberOfInputs: 1
               InputTypes: [1x1 embedded.numerictype]
         InputLowerBounds: -3.5296
         InputUpperBounds: 13.3944
               OutputType: [1x1 embedded.numerictype]
                  Options: [1x1 FunctionApproximation.Options]

The ClassregProblem object in the Problem property contains the information extracted from the inputs you specify when you create approxObj. For details, see the FunctionApproximation.ClassregProblem (Fixed-Point Designer) reference page.

The input predictor data that you specify when you create the FunctionApproximation.TransformFunction object determines the bounds InputLowerBounds and InputUpperBounds. If the range between the two bounds is too large, generating the lookup table can be time consuming. In this example, you normalized X and scaled XTest to reduce the range.

Create a new data type structure T_new by using the approximate (Fixed-Point Designer) function.

T_new = approximate(approxObj)
T_new = struct with fields:
                   XDataType: [0x0 embedded.fi]
    TransformedScoreDataType: [0x0 embedded.fi]
               ScoreDataType: [0x0 embedded.fi]
         LookupTableFunction: '@myMdl_lookup'

approximate returns the output T_new and generates the lookup table function myMdl_lookup. T_new contains all the fields stored in T and an additional field LookupTableFunction, which contains a function handle to myMdl_lookup.

Define Entry-Point Function

Define an entry-point function named myFixedPointPredict that predicts classification labels and scores using a trained model.

function [label,score] = myFixedPointPredict(X,T) %#codegen
Mdl = loadLearnerForCoder('myMdl','DataType',T);
[label,score] = predict(Mdl,X);
end

Generate Code

Convert XTest to the type specified in T_new.XDataType by using the cast (Fixed-Point Designer) function.

XTest_fx = cast(XTest,'like',T_new.XDataType);

Generate code for the entry-point function using codegen. Instead of specifying a variable-size input for the predictor data set, specify a fixed-size input by using coder.typeof. If you know the size of the predictor data set to pass to the generated code, then generating code for the fixed-size input is preferable for simplicity.

codegen myFixedPointPredict -args {coder.typeof(XTest_fx,[1,5],[0,0]),coder.Constant(T_new)}
Code generation successful.

codegen generates the MEX function myFixedPointPredict_mex with a platform-dependent extension.

Verify Generated Code

Pass the test data set XTest to predict and myFixedPointPredict_mex to compare the outputs.

[labels,scores] = predict(Mdl,XTest);
n = size(XTest,1);
labels_fx = true(n,1);
scores_fx = zeros(n,2);
for i = 1:n
    [labels_fx(i),scores_fx(i,:)] = myFixedPointPredict_mex(XTest_fx(i,:),T_new);
end

Compare the outputs from predict and myFixedPointPredict_mex.

verify_labels = isequal(labels,labels_fx)
verify_labels = logical
   0

isequal returns logical 0 (false), which means labels and labels_fx are not the same. Find the mismatched labels.

idx = find(labels_fx ~= labels)
idx = 2×1

        4815
       14114

The MEX function returns the same labels as the predict function except for two samples in XTest.

Find the maximum of the relative differences between the score outputs.

relDiff_scores = max(abs((double(scores_fx(:,1))-scores(:,1))./scores(:,1)))
relDiff_scores = 
0.0407

Input Arguments

collapse all

Name of the MATLAB formatted binary file (MAT-file) that contains the structure array representing a model object, specified as a character vector or string scalar.

You must create the filename file using saveLearnerForCoder, and the model in filename can be one of the following:

The extension of the filename file must be .mat. If filename has no extension, then generateLearnerDataTypeFcn appends .mat.

If filename does not include a full path, then generateLearnerDataTypeFcn loads the file from the current folder.

Example: 'myMdl'

Data Types: char | string

Predictor data for the predict function of the model stored in filename, specified as an n-by-p numeric matrix, where n is the number of observations and p is the number of predictor variables.

Data Types: single | double

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: generateLearnerDataTypeFcn(filename,X,'OutputFunctionName','myDataTypeFcn','WordLength',32) generates a data type function named myDataTypeFcn that uses 32 bits for the word length when defining the fixed-point data type for each variable.

Name of the generated function, specified as the comma-separated pair consisting of 'OutputFunctionName' and a character vector or string scalar. The 'OutputFunctionName' value must be a valid MATLAB function name.

The default function name is the file name in filename followed by _datatype. For example, if filename is myMdl, then the default function name is myMdl_datatype.

Example: 'OutputFunctionName','myDataTypeFcn'

Data Types: char | string

Word length in bits, specified as the comma-separated pair consisting of 'WordLength' and a numeric scalar.

The generated data type function defines a fixed-point object for each variable using the specified 'WordLength' value. If a variable requires a longer word length than the specified value, the software doubles the word length for the variable.

The optimal word length depends on your target hardware properties. When the specified word length is longer than the longest word size of your target hardware, the generated code contains multiword operations.

For details, see Fixed-Point Data Types (Fixed-Point Designer).

Example: 'WordLength',32

Data Types: single | double

Range of the output argument of the predict function, specified as the comma-separated pair consisting of 'OutputRange' and a numeric vector of two elements (minimum and maximum values of the output).

The 'OutputRange' value specifies the range of predicted class scores for a classification model and the range of predicted responses for a regression model. The following tables list the output arguments for which you can specify the range by using the 'OutputRange' name-value pair argument.

Classification Model

Modelpredict Function of ModelOutput Argument
Decision treepredictscore
Ensemble of decision treespredictscore
SVMpredictscore

Regression Model

Modelpredict Function of ModelOutput Argument
Decision treepredictYfit
Ensemble of decision treespredictYfit
SVMpredictyfit

When X contains a large number of observations and the range for the output argument is known, specify the 'OutputRange' value to reduce the amount of computation.

If you do not specify the 'OutputRange' value, then the software simulates the output range using the predictor data X and the predict function.

The software determines the span of numbers that the fixed-point data can represent by using the 'OutputRange' value and the 'PercentSafetyMargin' value.

Example: 'OutputRange',[0,1]

Data Types: single | double

Safety margin percentage, specified as the comma-separated pair consisting of 'PercentSafetyMargin' and a numeric scalar.

For each variable, the software simulates the range of the variable and adds the specified safety margin to determine the span of numbers that the fixed-point data can represent. Then, the software proposes the maximum fraction length that does not cause overflows.

Use caution when you specify the 'PercentSafetyMargin' value. If a variable range is large, then increasing the safety margin can cause underflow, because the software decreases fraction length to represent a larger range using a given word length.

Example: 'PercentSafetyMargin',15

Data Types: single | double

More About

collapse all

Data Type Function

Use the data type function generated by generateLearnerDataTypeFcn to create a structure that defines fixed-point data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model. Use the output structure of the data type function as the input argument T of loadLearnerForCoder.

If filename is 'myMdl', then generateLearnerDataTypeFcn generates a data type function named myMdl_datatype. The myMdl_datatype function supports this syntax:

T = myMdl_datatype(dt)

T = myMdl_datatype(dt) returns a data type structure that defines data types for the variables required to generate fixed-point C/C++ code for prediction of a machine learning model.

Each field of T contains a fixed-point object returned by fi (Fixed-Point Designer). The input argument dt specifies the DataType property of the fixed-point object.

  • Specify dt as 'Fixed'(default) for fixed-point code generation.

  • Specify dt as 'Double' to simulate floating-point behavior of the fixed-point code.

Use the output structure T as the second input argument of loadLearnerForCoder.

The structure T contains the fields in the following table. These fields define the data types for the variables that directly influence the precision of the model. These variables, along with other named and internal variables, are required to run the predict function of the model.

DescriptionFields
Common fields for classification
  • XDataType (input)

  • ScoreDataType (output or internal variable) and TransformedScoreDataType (output)

    • If you train a model using the default 'ScoreTransform' value of 'none' or 'identity' (that is, you do not transform predicted scores), then the ScoreDataType field influences the precision of the output scores.

    • If you train a model using a value of 'ScoreTransform' other than 'none' or 'identity' (that is, you do transform predicted scores), then the ScoreDataType field influences the precision of the internal untransformed scores. The TransformedScoreDataType field influences the precision of the transformed output scores.

Common fields for regression
  • XDataType (input)

  • YFitDataType (output)

Additional fields for an ensemble of decision trees
  • WeakLearnerOutputDataType (internal variable) — Data type for outputs from weak learners.

  • AggregatedLearnerWeightsDataType (internal variable) — Data type for a weighted aggregate of the outputs from weak learners, applicable only if you train a model using bagging ('Method','bag'). The software computes predicted scores (ScoreDataType) by dividing the aggregate by the sum of learner weights.

Additional fields for SVM
  • XnormDataType (internal variable), applicable only if you train a model using 'Standardize' or 'KernelScale'

  • InnerProductDataType (internal variable)

The software proposes the maximum fraction length that does not cause overflows, based on the default word length (16) and safety margin (10%) for each variable.

The following code shows the data type function myMdl_datatype, generated by generateLearnerDataTypeFcn when filename is 'myMdl' and the model in the filename file is an SVM classifier.

function T = myMdl_datatype(dt)

if nargin < 1
	dt = 'Fixed';
end

% Set fixed-point math settings
fm = fimath('RoundingMethod','Floor', ...
    'OverflowAction','Wrap', ...
    'ProductMode','FullPrecision', ...
    'MaxProductWordLength',128, ...
    'SumMode','FullPrecision', ...
    'MaxSumWordLength',128);

% Data type for predictor data
T.XDataType = fi([],true,16,14,fm,'DataType',dt);

% Data type for output score
T.ScoreDataType = fi([],true,16,14,fm,'DataType',dt);

% Internal variables
% Data type of the squared distance dist = (x-sv)^2 for the Gaussian kernel G(x,sv) = exp(-dist),
% where x is the predictor data for an observation and sv is a support vector
T.InnerProductDataType = fi([],true,16,6,fm,'DataType',dt);

end

Tips

  • To improve the precision of the generated fixed-point code, you can tune the fixed-point data types. Modify the fixed-point data types by updating the data type function (myMdl_datatype) and creating a new structure, and then regenerate the code using the new structure. You can update the myMdl_datatype function in one of two ways:

    • Regenerate the myMdl_datatype function by using generateLearnerDataTypeFcn and its name-value pair arguments.

      If you increase the word length or decrease the safety margin, the software can propose a longer fraction length, and therefore, improve the precision of the generated code based on the given data set.

    • Manually modify the fixed-point data types in the function file (myMdl_datatype.m). For each variable, you can tune the word length and fraction length and specify fixed-point math settings using a fimath (Fixed-Point Designer) object.

  • In the generated fixed-point code, a large number of operations or a large variable range can result in loss of precision, compared to the precision of the corresponding floating-point code. When training an SVM model, keep the following tips in mind to avoid loss of precision in the generated fixed-point code:

    • Data standardization ('Standardize') — To avoid overflows in the model property values of support vectors in an SVM model, you can standardize the predictor data. Instead of using the 'Standardize' name-value pair argument when training the model, standardize the predictor data before passing the data to the fitting function and the predict function so that the fixed-point code does not include the operations for the standardization.

    • Kernel function ('KernelFunction') — Using the Gaussian kernel or linear kernel is preferable to using a polynomial kernel. A polynomial kernel requires higher computational complexity than the other kernels, and the output of a polynomial kernel function is unbounded.

    • Kernel scale ('KernelScale') — Using a kernel scale requires additional operations if the value of 'KernelScale' is not 1.

    • The prediction of a one-class classification problem might have loss of precision if the predicted class score values have a large range.

  • You can generate a lookup table that approximates a score transformation function of a trained classifier by using a FunctionApproximation.TransformFunction (Fixed-Point Designer) object and its function approximate (Fixed-Point Designer). Then use the lookup table for fixed-point code generation. This approach requires fewer calculations for score transformation in the generated code than the default approach, which uses the CORDIC-based algorithm. Therefore, using a lookup table yields relatively high-speed performance and relatively low memory requirements. The supported score transformation functions include 'doublelogit', 'logit', and 'symmetriclogit'. For an example, see Use Lookup Table to Approximate Score Transformation. (since R2023a)

Version History

Introduced in R2019b

expand all

See Also

| | (Fixed-Point Designer) | (Fixed-Point Designer) | (MATLAB Coder) | (Fixed-Point Designer)