Deep learning uses neural network architectures that contain many processing layers, including convolutional layers. Deep learning models typically work on large sets of labeled data. Performing inference on these models is computationally intensive, consuming significant amount of memory. Neural networks use memory to store input data, parameters (weights), and activations from each layer as the input propagates through the network. Deep Neural networks trained in MATLAB® use single-precision floating point data types. Even networks that are small in size require a considerable amount of memory and hardware to perform these floating-point arithmetic operations. These restrictions can inhibit deployment of deep learning models to devices that have low computational power and smaller memory resources. By using a lower precision to store the weights and activations, you can reduce the memory requirements of the network.
You can use Deep Learning Toolbox™ in tandem with the Deep Learning Toolbox Model Quantization Library support package to reduce the memory footprint of a deep neural network by quantizing the weights, biases, and activations of convolution layers to 8-bit scaled integer data types. Then, you can use MATLAB Coder™ to generate optimized code for the quantized network. The generated code takes advantage of ARM® processor SIMD by using the ARM Compute library. The generated code can be integrated into your project as source code, static or dynamic libraries, or executables that you can deploy to a variety of ARM CPU platforms such as Raspberry Pi™.
You can generate C++ code for these layers that uses the ARM Compute Library and performs inference computations in 8-bit integers:
2-D convolution layer (
convolution2dLayer (Deep Learning Toolbox))
2-D grouped convolution layer (
groupedConvolution2dLayer (Deep Learning Toolbox)). The value of the
input argument must be equal to
Max pooling layer (
maxPooling2dLayer (Deep Learning Toolbox))
Rectified Linear Unit (ReLU) layer (
reluLayer (Deep Learning Toolbox)
To generate code that performs inference computations in 8-bit integers, in your
dlcfg, set these additional
dlcfg.CalibrationResultFile = 'dlquantizerObjectMatFile'; dlcfg.DataType = 'int8';
Alternatively, in the MATLAB
Coder app, on the Deep Learning tab, set Target
ARM Compute. Then set the Data
type and Calibration result file path parameters.
'dlquantizerObjectMatFile' is the name of the MAT-file that
dlquantizer (Deep Learning Toolbox)
generates for specific calibration data. For the purpose of calibration, set the
ExecutionEnvironment property of the
Otherwise, follow the steps described in Code Generation for Deep Learning Networks with ARM Compute Library.
For an example, see Code Generation for Quantized Deep Learning Network on Raspberry Pi.
dlquantizer(Deep Learning Toolbox) |
dlquantizationOptions(Deep Learning Toolbox) |
calibrate(Deep Learning Toolbox) |
validate(Deep Learning Toolbox) |