quantize

Quantize deep neural network

Since R2022a

Syntax

quantizedNetwork = quantize(quantObj)

quantizedNetwork = quantize(quantObj,Name,Value)

Description

quantizedNetwork = quantize(quantObj) quantizes a deep neural network using a calibrated dlquantizer object, quantObj. The quantized neural network object, quantizedNetwork, enables visibility of the quantized layers, weights, and biases of the network, as well as simulatable quantized inference behavior.

example

quantizedNetwork = quantize(quantObj,Name,Value) specifies additional options using one or more name name-value arguments.

This function requires the Deep Learning Toolbox Model Compression Library. To learn about the products required to quantize a deep neural network, see Quantization Workflow Prerequisites.

example

Examples

collapse all

Emulate Target Agnostic Quantized Network

This example uses:

Open Live Script

This example shows how to create a target agnostic, simulatable quantized deep neural network in MATLAB.

Target agnostic quantization allows you to see the effect quantization has on your neural network without target hardware or target-specific quantization schemes. Creating a target agnostic quantized network is useful if you:

Do not have access to your target hardware.
Want to preview whether or not your network is suitable for quantization.
Want to find layers that are sensitive to quantization.

Quantized networks emulate quantized behavior for quantization-compatible layers. Network architecture like layers and connections are the same as the original network, but inference behavior uses limited precision types. Once you have quantized your network, you can use the quantizationDetails function to retrieve details on what was quantized.

Load the pretrained network. net is a SqueezeNet network that has been retrained using transfer learning to classify images in the MerchData data set.

load squeezedlnetmerch
net

net = 
  dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.

You can use the quantizationDetails function to see that the network is not quantized.

qDetailsOriginal = quantizationDetails(net)

qDetailsOriginal = struct with fields:
            IsQuantized: 0
          TargetLibrary: ""
    QuantizedLayerNames: [0×0 string]
    QuantizedLearnables: [0×3 table]

Unzip and load the MerchData images as an image datastore and extract the classes from the datastore.

unzip('MerchData.zip')
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
classes = categories(imds.Labels);

Define calibration and validation data to use for quantization. The output size of the images are changed for both calibration and validation data according to network requirements.

[calData,valData] = splitEachLabel(imds,0.7,'randomized');
augCalData = augmentedImageDatastore([227 227],calData);
augValData = augmentedImageDatastore([227 227],valData);

Create dlquantizer object and specify the network to quantize. Set the execution environment to MATLAB. How the network is quantized depends on the execution environment. The MATLAB execution environment is agnostic to the target hardware and allows you to prototype quantized behavior. When you use the MATLAB execution environment, quantization is performed using the fi fixed-point data type which requires a Fixed-Point Designer™ license.

quantObj = dlquantizer(net,'ExecutionEnvironment','MATLAB');

Use the calibrate function to exercise the network with sample inputs and collect range information. The calibrate function exercises the network and collects the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. The function returns a table. Each row of the table contains range information for a learnable parameter of the optimized network.

calResults = calibrate(quantObj,augCalData);

Use the quantize method to quantize the network object and return a simulatable quantized network.

qNet = quantize(quantObj)

qNet = 
  Quantized dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.
  Use the quantizationDetails function to extract quantization details.

You can use the quantizationDetails function to see that the network is now quantized.

qDetailsQuantized = quantizationDetails(qNet)

qDetailsQuantized = struct with fields:
            IsQuantized: 1
          TargetLibrary: "none"
    QuantizedLayerNames: [53×1 string]
    QuantizedLearnables: [52×3 table]

Make predictions using the original, single-precision floating-point network, and the quantized INT8 network.

origScores = minibatchpredict(net,augValData);
predOriginal = scores2label(origScores,classes);    % Predictions for the non-quantized network

qScores = minibatchpredict(qNet,augValData);
predQuantized = scores2label(qScores,classes);     % Predictions for the quantized network

Compute the relative accuracy of the quantized network as compared to the original network.

ccrQuantized = mean(squeeze(predQuantized) == valData.Labels)*100

ccrQuantized = 100

ccrOriginal = mean(squeeze(predOriginal) == valData.Labels)*100

ccrOriginal = 100

For this validation data set, the quantized network gives the same predictions as the floating-point network.

Emulate GPU Target Behavior for Quantized Network

This example uses:

Open Live Script

This example shows how to emulate the behavior of a quantized network for GPU deployment. Once you quantize your network for a GPU execution environment, you can emulate the GPU target behavior without the GPU hardware. Doing so allows you to examine your quantized network structure and behavior without generating code for deployment.

Emulated quantized networks are not smaller than the original network.

Load the pretrained network. net is a SqueezeNet convolutional neural network that has been retrained using transfer learning to classify images in the MerchData data set.

load squeezedlnetmerch
net

net = 
  dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.

Define calibration and validation data to use for quantization.

Use the calibration data to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

Use the validation data to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

For this example, use the images in the MerchData data set. Split the data into calibration and validation data sets.

unzip("MerchData.zip");
imds = imageDatastore("MerchData", ...
    IncludeSubfolders=true, ...
    LabelSource="foldernames");
classes = categories(imds.Labels);
[calData,valData] = splitEachLabel(imds,0.7,"randomized");

Create a dlquantizer object and specify the network to quantize. How the network is quantized depends on the execution environment. Set ExecutionEnvironment to GPU to perform quantization specific to GPU target hardware.

quantObj = dlquantizer(net,ExecutionEnvironment="GPU");

Use the calibrate function to exercise the network object with sample inputs and collect range information.

calResults = calibrate(quantObj,calData);

Use the quantize method to quantize the network object and return a simulatable quantized network.

qNet = quantize(quantObj)

qNet = 
  Quantized dlnetwork with properties:

         Layers: [67×1 nnet.cnn.layer.Layer]
    Connections: [74×2 table]
     Learnables: [52×3 table]
          State: [0×3 table]
     InputNames: {'data'}
    OutputNames: {'prob'}
    Initialized: 1

  View summary with summary.
  Use the quantizationDetails function to extract quantization details.

You can use the quantizationDetails method to see that the network is now quantized.

qDetails = quantizationDetails(qNet)

qDetails = struct with fields:
            IsQuantized: 1
          TargetLibrary: "cudnn"
    QuantizedLayerNames: [55×1 string]
    QuantizedLearnables: [35×3 table]

The TargetLibrary field shows that the quantized network emulates the CUDA® Deep Neural Network library (cuDNN).

The QuantizedLayerNames field displays a list of layers that have been quantized.

qDetails.QuantizedLayerNames(1:5)

ans = 5×1 string
    "conv1"
    "relu_conv1"
    "pool1"
    "fire2-squeeze1x1"
    "fire2-relu_squeeze1x1"

The QuantizedLearnables field contains additional details on quantized network learnable parameters. In this example, the 2-D convolutional layer, conv1, has had the weights scaled and cast to int8. The bias is scaled and remains in single precision. The values of quantized learnables are returned as stored integer values.

qDetails.QuantizedLearnables

ans=35×3 table
          Layer           Parameter           Value       
    __________________    _________    ___________________

    "conv1"               "Weights"    {3×3×3×64   int8  }
    "conv1"               "Bias"       {1×1×64     single}
    "fire2-squeeze1x1"    "Weights"    {1×1×64×16  int8  }
    "fire2-squeeze1x1"    "Bias"       {1×1×16     single}
    "fire2-expand1x1"     "Weights"    {1×1×16×64  int8  }
    "fire2-expand3x3"     "Weights"    {3×3×16×64  int8  }
    "fire3-squeeze1x1"    "Weights"    {1×1×128×16 int8  }
    "fire3-squeeze1x1"    "Bias"       {1×1×16     single}
    "fire3-expand1x1"     "Weights"    {1×1×16×64  int8  }
    "fire3-expand3x3"     "Weights"    {3×3×16×64  int8  }
    "fire4-squeeze1x1"    "Weights"    {1×1×128×32 int8  }
    "fire4-squeeze1x1"    "Bias"       {1×1×32     single}
    "fire4-expand1x1"     "Weights"    {1×1×32×128 int8  }
    "fire4-expand3x3"     "Weights"    {3×3×32×128 int8  }
    "fire5-squeeze1x1"    "Weights"    {1×1×256×32 int8  }
    "fire5-squeeze1x1"    "Bias"       {1×1×32     single}
      ⋮

You can use the quantized network to emulate how a network quantized for GPU target hardware would perform a classification task.

Make predictions using the original, single-precision floating-point network. To accelerate the computation by compiling and executing a MEX function on the GPU, use the acceleration option "mex" of the predict function.

XTest = readall(valData);
XTest = cat(4,XTest{:});
XTest = dlarray(gpuArray(single(XTest)),"SSCB");                        
TTest = valData.Labels;

YTestOriginal = predict(net,XTest,Acceleration="mex");

Generating MEX for cudnn target.

YTestOriginal = onehotdecode(YTestOriginal,classes,3);

Make predictions using the quantized INT8 network. Use the acceleration option "mex" of the predict function. MEX acceleration is supported for quantized networks based on quantization objects with ExecutionEnvironment set to GPU.

YTestQuantized = predict(qNet,XTest,Acceleration="mex");

Generating MEX for cudnn target.

YTestQuantized = onehotdecode(YTestQuantized,classes,3);

Compute the relative accuracy of the quantized network as compared to the original network.

ccrOriginal = mean(squeeze(YTestOriginal) == valData.Labels)

ccrOriginal = 
1

ccrQuantized = mean(squeeze(YTestQuantized) == valData.Labels)

ccrQuantized = 
1

The quantized network shows no drop in accuracy.

Emulate FPGA Target Behavior for Quantized Network

This example uses:

Open Live Script

This example shows how to emulate the behavior of a quantized network for FPGA deployment. Once you quantize your network for an FPGA execution environment, you can emulate the FPGA target behavior without any FPGA hardware. This action allows you to examine your quantized network structure and behavior without generating code for deployment.

This example uses a simple convolutional neural network that is to trained to classify user-written digits from 0 to 9. To learn more about how this network was trained, see Parameter Pruning and Quantization of Image Classification Network.

Load the pretrained network.

load trainedDigitsNet.mat

Define calibration and validation data to use for quantization.

Use the calibration data to collect the dynamic ranges of the weights and biases in the fully connected layers, the dynamic ranges of the activations in all the layers, and the dynamic ranges of the parameters for some layers. For the best quantization results, the calibration data must be representative of inputs to the network.

Use the validation data to test the network after quantization. Test the network to determine the effects of the limited range and precision of the quantized layers and layer parameters in the network.

This example uses the images in the Digits data set. Create an imageDatastore object, then split the data into calibration and validation data sets.

digitDatasetPath = fullfile(matlabroot,'toolbox','nnet','nndemos', ...
    'nndatasets','DigitDataset');
imds = imageDatastore(digitDatasetPath, ...
    'IncludeSubfolders',true,'LabelSource','foldernames');
classes = categories(imds.Labels);
[calData,valData] = splitEachLabel(imds,0.75,"randomized");

Create a dlquantizer object and specify the network to quantize. Set the execution environment for the quantized network to FPGA.

quantObj = dlquantizer(dlnet,ExecutionEnvironment="FPGA");

Use the calibrate function to exercise the network with sample inputs and collect range information.

calResults = calibrate(quantObj,calData,UseGPU="off");

Use the quantize function to quantize the network object and return a quantized network for simulation.

qNet = quantize(quantObj)

qNet = 
  Quantized dlnetwork with properties:

         Layers: [10×1 nnet.cnn.layer.Layer]
    Connections: [9×2 table]
     Learnables: [8×3 table]
          State: [0×3 table]
     InputNames: {'imageinput'}
    OutputNames: {'fc'}
    Initialized: 1

  View summary with summary.
  Use the quantizationDetails function to extract quantization details.

Use the quantizationDetails method to extract quantization details.

You can use the quantizationDetails function to confirm that the network is now quantized. The TargetLibrary field shows that the quantized network emulates an FPGA target.

qDetails = quantizationDetails(qNet)

qDetails = struct with fields:
            IsQuantized: 1
          TargetLibrary: "fpga"
    QuantizedLayerNames: [9×1 string]
    QuantizedLearnables: [8×3 table]

The QuantizedLayerNames field displays a list of quantized layers.

qDetails.QuantizedLayerNames

ans = 9×1 string
    "conv_1"
    "relu_1"
    "maxpool_1"
    "conv_2"
    "relu_2"
    "maxpool_2"
    "conv_3"
    "relu_3"
    "fc"

The QuantizedLearnables field contains additional details about the quantized network learnable parameters. In this example, the 2-D convolutional layers and fully connected layers have their weights scaled and cast to int8. The bias is scaled and remains in int32. The quantizationDetails function returns the values of the quantized learnables as stored integer values.

qDetails.QuantizedLearnables

ans=8×3 table
     Layer      Parameter          Value       
    ________    _________    __________________

    "conv_1"    "Weights"    { 3×3×1×8   int8 }
    "conv_1"    "Bias"       { 1×1×8     int32}
    "conv_2"    "Weights"    { 3×3×8×16  int8 }
    "conv_2"    "Bias"       { 1×1×16    int32}
    "conv_3"    "Weights"    { 3×3×16×32 int8 }
    "conv_3"    "Bias"       { 1×1×32    int32}
    "fc"        "Weights"    {10×1568    int8 }
    "fc"        "Bias"       {10×1       int32}

You can use the quantized network to emulate a network quantized for FPGA target hardware performing a classification task.

ccr = testnet(qNet,valData,"accuracy")

ccr = 
99.1600

Input Arguments

collapse all

`quantObj` — Network to quantize
`dlquantizer` object

dlquantizer object containing the network to quantize, calibrated using the calibrate object function. The ExecutionEnvironment must be set to 'GPU' 'FPGA', or 'MATLAB'.

Name-Value Arguments

collapse all

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Example: quantizedNetwork = quantize(quantObj,'ExponentScheme','Histogram')

`ExponentScheme` — Exponent selection scheme
`'MinMax'` (default) | `'Histogram'`

Exponent selection scheme, specified as one of these values:

'MinMax' — Evaluate the exponent based on the range information in the calibration statistics and avoid overflows.
'Histogram' — Distribution-based scaling which evaluates the exponent to best fit the calibration data.

Example: 'ExponentScheme','Histogram'

Output Arguments

collapse all

`quantizedNetwork` — Quantized neural network
`dlnetwork` object | `DAGNetwork` object | `yolov2ObjectDetector` object | `yolov3ObjectDetector` object | `yolov4ObjectDetector` object | `ssdObjectDetector` object

Quantized neural network, returned as a dlnetwork, DAGNetwork, yolov2ObjectDetector (Computer Vision Toolbox), yolov3ObjectDetector (Computer Vision Toolbox), yolov4ObjectDetector (Computer Vision Toolbox), or a ssdObjectDetector (Computer Vision Toolbox) object.

Limitations

The quantize function does not support quantization of networks using dlquantizer objects with ExecutionEnvironment set to 'CPU'.
Code generation does not support quantized deep neural networks produced by the quantize function.

Version History

Introduced in R2022a

expand all

R2023a: Quantize `dlquantizer` objects that specify a `dlnetwork`

The quantize function now supports quantization of dlnetwork objects using a calibration dlquantizer object.

R2022b: `quantize` support for FPGA execution environment

Use the quantize method to create a simulatable quantized network when the ExecutionEnvironment property of dlquantizer is set to FPGA. The simulatable quantized network enables visibility of the quantized layers, weights, and biases of the network, as well as simulatable quantized inference behavior.

R2022a: Quantize `dlquantizer` objects calibrated in R2022a and later

The quantize function supports quantization of dlquantizer objects that are calibrated in R2022a and later.

quantize

Syntax

Description

Examples

Emulate Target Agnostic Quantized Network

Emulate GPU Target Behavior for Quantized Network

Emulate FPGA Target Behavior for Quantized Network

Input Arguments

`quantObj` — Network to quantize
`dlquantizer` object

Name-Value Arguments

`ExponentScheme` — Exponent selection scheme
`'MinMax'` (default) | `'Histogram'`

Output Arguments

`quantizedNetwork` — Quantized neural network
`dlnetwork` object | `DAGNetwork` object | `yolov2ObjectDetector` object | `yolov3ObjectDetector` object | `yolov4ObjectDetector` object | `ssdObjectDetector` object

Limitations

Version History

R2023a: Quantize `dlquantizer` objects that specify a `dlnetwork`

R2022b: `quantize` support for FPGA execution environment

R2022a: Quantize `dlquantizer` objects calibrated in R2022a and later

See Also

Apps

Functions

Topics

quantize

Syntax

Description

Examples

Emulate Target Agnostic Quantized Network

Emulate GPU Target Behavior for Quantized Network

Emulate FPGA Target Behavior for Quantized Network

Input Arguments

quantObj — Network to quantize dlquantizer object

Name-Value Arguments

ExponentScheme — Exponent selection scheme 'MinMax' (default) | 'Histogram'

Output Arguments

quantizedNetwork — Quantized neural network dlnetwork object | DAGNetwork object | yolov2ObjectDetector object | yolov3ObjectDetector object | yolov4ObjectDetector object | ssdObjectDetector object

Limitations

Version History

R2023a: Quantize dlquantizer objects that specify a dlnetwork

R2022b: quantize support for FPGA execution environment

R2022a: Quantize dlquantizer objects calibrated in R2022a and later

See Also

Apps

Functions

Topics

`quantObj` — Network to quantize
`dlquantizer` object

`ExponentScheme` — Exponent selection scheme
`'MinMax'` (default) | `'Histogram'`

`quantizedNetwork` — Quantized neural network
`dlnetwork` object | `DAGNetwork` object | `yolov2ObjectDetector` object | `yolov3ObjectDetector` object | `yolov4ObjectDetector` object | `ssdObjectDetector` object

R2023a: Quantize `dlquantizer` objects that specify a `dlnetwork`

R2022b: `quantize` support for FPGA execution environment

R2022a: Quantize `dlquantizer` objects calibrated in R2022a and later