Main Content

Emulate Quantized Network Behavior for GPU Target

Once you quantize your network for a GPU environment, you can emulate the GPU target behavior without the GPU hardware. Doing so allows you to examine your quantized network structure and behavior without generating code for deployment. Emulated quantized networks are not smaller than the original network.

After quantizing your network with dlquantizer and ExecutionEnvironment to 'GPU', you can the behavior of your network when deployed to a GPU target. This option emulates the quantization by scaling, saturating, and rounding the single data to behave like int8 data.

Emulate GPU Target Behavior for Quantized Network

This example shows how to emulate the behavior of a quantized network for GPU deployment. The example uses the pretrained squeezeNet convolutional neural network to demonstrate quantization for the network.

First, load the squeezeNet network.

load squeezenetmerch
net
net = 
  DAGNetwork with properties:

         Layers: [68×1 nnet.cnn.layer.Layer]
    Connections: [75×2 table]
     InputNames: {'data'}
    OutputNames: {'new_classoutput'}

Define calibration and validation data to use for quantization.

Use the calibration data to collect the dynamic ranges of the weights and biases in the convolution and fully connected layers of the network and the dynamic ranges of the activations in all layers of the network. For the best quantization results, the calibration data must be representative of inputs to the network.

Use the validation data to test the network after quantization to understand the effects of the limited range and precision of the quantized convolution layers in the network.

For this example, use the images in the MerchData data set. Define an augmentedImageDatastore object to resize the data for the network. Then, split the data into calibration and validation data sets.

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');
[calData, valData] = splitEachLabel(imds, 0.7, 'randomized');
aug_calData = augmentedImageDatastore([227 227], calData);
aug_valData = augmentedImageDatastore([227 227], valData);

Create a dlquantizer object and specify the network to quantize.

quantObj = dlquantizer(net,'ExecutionEnvironment','GPU');

Calibrate the network.

calResults = calibrate(quantObj, aug_calData);
qNet = quantize(quantObj)
qNet = 
Quantized DAGNetwork with properties:

         Layers: [68×1 nnet.cnn.layer.Layer]
    Connections: [75×2 table]
     InputNames: {'data'}
    OutputNames: {'new_classoutput'}

Use the quantizationDetails method to extract quantization details.

qDetails = quantizationDetails(qNet) 
qDetails = struct with fields:
            IsQuantized: 1
          TargetLibrary: "cudnn"
    QuantizedLayerNames: [55×1 string]
    QuantizedLearnables: [35×3 table]

The target library cudnn shows that the quantized network emulates CUDA® Deep Neural Network library (cuDNN).

qDetails.QuantizedLayerNames(1:5)
ans = 5×1 string
    "conv1"
    "relu_conv1"
    "pool1"
    "fire2-squeeze1x1"
    "fire2-relu_squeeze1x1"

In addition, the batchnormalization and relu layers are also marked as quantized if fused with a convolution layer.

qDetails.QuantizedLearnables
ans=35×3 table
          Layer           Parameter           Value       
    __________________    _________    ___________________

    "conv1"               "Weights"    {3×3×3×64   int8  }
    "conv1"               "Bias"       {1×1×64     single}
    "fire2-squeeze1x1"    "Weights"    {1×1×64×16  int8  }
    "fire2-squeeze1x1"    "Bias"       {1×1×16     single}
    "fire2-expand1x1"     "Weights"    {1×1×16×64  int8  }
    "fire2-expand3x3"     "Weights"    {3×3×16×64  int8  }
    "fire3-squeeze1x1"    "Weights"    {1×1×128×16 int8  }
    "fire3-squeeze1x1"    "Bias"       {1×1×16     single}
    "fire3-expand1x1"     "Weights"    {1×1×16×64  int8  }
    "fire3-expand3x3"     "Weights"    {3×3×16×64  int8  }
    "fire4-squeeze1x1"    "Weights"    {1×1×128×32 int8  }
    "fire4-squeeze1x1"    "Bias"       {1×1×32     single}
    "fire4-expand1x1"     "Weights"    {1×1×32×128 int8  }
    "fire4-expand3x3"     "Weights"    {3×3×32×128 int8  }
    "fire5-squeeze1x1"    "Weights"    {1×1×256×32 int8  }
    "fire5-squeeze1x1"    "Bias"       {1×1×32     single}
      ⋮

ypred = qNet.classify(valData);
ccr = mean(ypred == valData.Labels)
ccr = 1

With this quantized layer there is no drop in accuracy.

See Also

Apps

Functions

Related Topics