Classify Images on FPGA by Using Quantized GoogLeNet Network

This example uses:

Deep Learning HDL Toolbox Deep Learning HDL Toolbox
Deep Learning Toolbox Deep Learning Toolbox
Deep Learning HDL Toolbox Support Package for Intel FPGA and SoC Devices Deep Learning HDL Toolbox Support Package for Intel FPGA and SoC Devices
Deep Learning Toolbox Model Compression Library Deep Learning Toolbox Model Compression Library
Deep Learning Toolbox Model for GoogLeNet Network Deep Learning Toolbox Model for GoogLeNet Network
Image Processing Toolbox Image Processing Toolbox
MATLAB Coder Interface for Deep Learning MATLAB Coder Interface for Deep Learning

This example shows how to use the Deep Learning HDL Toolbox™ to deploy a quantized GoogleNet network to classify an image. The example uses the pretrained GoogLeNet network to demonstrate transfer learning, quantization, and deployment for the quantized network. Quantization helps reduce the memory requirement of a deep neural network by quantizing weights, biases and activations of network layers to 8-bit scaled integer data types. Use MATLAB® to retrieve the prediction results.

Deploy the quantized GoogLeNet network by creating a dlhdl.Workflow object. Use the dlhdl.Workflow object to:

Generate a list of instructions, weights and biases by using the compile method.
Generate a programming file for the FPGA by using the deploy method.
Retrieve the network prediction results and performance by using the predict method.

GoogLeNet has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee mug, pencil, and many animals). The network has learned rich feature representations for a wide range of images. The network takes an image as input, and then outputs a label for the object in the image together with the probabilities for each of the object categories.

Prerequisites

Intel® Arria10 SoC development kit

Transfer Learning Using GoogLeNet

To perform classification on a new set of images, you fine-tune a pretrained GoogLeNet convolutional neural network by transfer learning. In transfer learning, you can take a pretrained network and use it as a starting point to learn a new task. Fine-tuning a network with transfer learning is usually much faster and easier than training a network with randomly initialized weights from scratch. You can quickly transfer learned features to a new task using a smaller number of training images.

Load Pretrained DAG Network

Load the pretrained dlnetwork, GoogLeNet.

net = imagePretrainedNetwork('googlenet');

Use the analyzeNetwork function to obtain information about the network layers.

analyzeNetwork(net);

The first layer, the image input layer, requires input images of size 224-by-224-by-3, where 3 is the number of color channels.

inputSize = net.Layers(1).InputSize

inputSize = 1×3

   224   224     3

Define Training and Validation Data Sets

This example uses the MathWorks® MerchData data set. This is a small data set containing 75 images of MathWorks merchandise, belonging to five different classes (cap, cube, playing cards, screwdriver, and torch).

unzip('MerchData.zip');
imds = imageDatastore('MerchData', ...
    'IncludeSubfolders',true, ...
    'LabelSource','foldernames');

Divide the data into training and validation data sets. Use 70% of the images for training and 30% for validation. splitEachLabel splits the images datastore into two new datastores.

[imdsTrain,imdsValidation] = splitEachLabel(imds,0.7,'randomized');

This data set now contains 55 training images and 20 validation images. Display some sample images.

numTrainImages = numel(imdsTrain.Labels);
idx = randperm(numTrainImages,16);
figure
for i = 1:16
    subplot(4,4,i)
    I = readimage(imdsTrain,idx(i));
    imshow(I)
end

Replace Final Layers

The fully connected layer of the pretrained network net is configured for 1000 classes. This layer, loss3-classifier in GoogLeNet, contains information on how to combine the features that the network extracts into class probabilities. To retrain a pretrained network to classify new images, replace this layer with a new layer adapted to the data set.

Replace the fully connected layer with a new fully connected layer that has number of outputs equal to the number of classes. To make learning faster in the new layers than in the transferred layers, increase the WeightLearnRateFactor and BiasLearnRateFactor values of the fully connected layer.

numClasses = numel(categories(imdsTrain.Labels))

numClasses = 
5

Remove 'loss3-classifier' from the network and replace it with a new fully-connected layer.

newLearnableLayer = fullyConnectedLayer(numClasses,'WeightLearnRateFactor',20,'BiasLearnRateFactor',20,'Name','newFC');
net = replaceLayer(net,'loss3-classifier',newLearnableLayer);

Train Network

The network requires input images of size 224-by-224-by-3, but the images in the image datastores have different sizes. Use an augmented image datastore to automatically resize the training images. Specify additional augmentation operations to perform on the training images: randomly flip the training images along the vertical axis, and randomly translate them up to 30 pixels horizontally and vertically. Data augmentation helps prevent the network from over-fitting and memorizing the exact details of the training images.

pixelRange = [-30 30];
imageAugmenter = imageDataAugmenter( ...
    'RandXReflection',true, ...
    'RandXTranslation',pixelRange, ...
    'RandYTranslation',pixelRange);
augimdsTrain = augmentedImageDatastore(inputSize(1:2),imdsTrain, ...
    'DataAugmentation',imageAugmenter);

To automatically resize the validation images without performing further data augmentation, use an augmented image datastore without specifying any additional preprocessing operations.

augimdsValidation = augmentedImageDatastore(inputSize(1:2),imdsValidation);

Specify the training options. For transfer learning, keep the features from the early layers of the pretrained network (the transferred layer weights). To slow down learning in the transferred layers, set the initial learning rate to a small value. In the previous step, the learning rate factors were increased for the fully connected layer to speed up learning in the new final layers. This combination of learning rate settings results in fast learning only in the new layers and slower learning in the other layers. When performing transfer learning, you do not need to train for as many epochs. An epoch is a full training cycle on the entire training data set. Specify the mini-batch size to be 11. The software validates the network every ValidationFrequency iterations during training.

options = trainingOptions('sgdm', ...
    'MiniBatchSize',11, ...
    'MaxEpochs',5, ...
    'InitialLearnRate',2e-4, ...
    'Shuffle','every-epoch', ...
    'ValidationData',augimdsValidation, ...
    'ValidationFrequency',3, ...
    'Verbose',false, ...
    'Plots','training-progress');

Train the network that consists of the transferred and new layers. By default, trainnet uses a GPU if one is available (requires Parallel Computing Toolbox™ and a supported GPU device. Otherwise, the network uses a CPU (requires MATLAB Coder™ Interface for Deep Learning Libraries™). You can also specify the execution environment by using the 'ExecutionEnvironment' name-value argument of trainingOptions.

netTransfer = trainnet(augimdsTrain,net,'crossentropy',options);

Create dlquantizer Object

Create a quantized network by using the dlquantizer object. Set the target execution environment to FPGA.

dlQuantObj = dlquantizer(netTransfer,'ExecutionEnvironment','FPGA');

Calibrate Quantized Network

Use the calibrate function to exercise the network by using sample inputs to collect the range information. The calibrate function exercises the network and collects the dynamic ranges for the learnable parameters of the convolution and fully connected layers of the network.

For best quantization results, the calibration data must be a representative of actual inputs that are predicted by the network.

dlQuantObj.calibrate(augimdsTrain);

Set Up Intel Quartus Prime Standard

Set the synthesis tool path to point to an installed Intel Quartus® Prime Standard Edition 20.1 executable file. You must have already installed Altera® Quartus II.

% hdlsetuptoolpath('ToolName','Altera Quartus II','ToolPath','C:\intel\20.1\quartus\bin\quartus.exe');

Create Target Object

Create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet.

hTarget = dlhdl.Target('Intel','Interface','JTAG');

Generate Bitstream to Run Network

The GoogleNet network consists of multiple Cross Channel Normalization layers. To support this layer on hardware, the 'LRNBlockGeneration' property of the conv module needs to be turned on in the bitstream used for FPGA inference. The shipping arria10soc_int8 bitstream does not have 'LRNBlockGeneration' property turned on. A new bitstream can be generated using the following lines of code. The generated bitstream can be used along with a workflow object for inference.

Update the processor configuration with 'LRNBlockGeneration' property turned on and 'SegmentationBlockGeneration' property turned off. Turn off 'SegmentationBlockGeneration' to fit the Deep Learning IP on the FPGA and avoid overutilization of resources.

% hPC = dlhdl.ProcessorConfig('Bitstream', 'arria10soc_int8');
% hPC.setModuleProperty('conv', 'LRNBlockGeneration', 'on');
% hPC.setModuleProperty('conv', 'SegmentationBlockGeneration', 'off');
% dlhdl.buildProcessor(hPC)

To learn how to use the generated bitstream file, see Generate Custom Bitstream.

Create Workflow Object

Create an object of the dlhdl.Workflow class. Specify dlQuantObj as the network. Make sure to use the generated bitstream which enables processing of Cross Channel Normalization layers on FPGA. In this example, the target FPGA board is the Intel Arria10 SOC board and the generated bitstream uses the int8 data type.

hW = dlhdl.Workflow('network', dlQuantObj, 'Bitstream', 'dlprocessor.sof','Target',hTarget);

Compile Workflow Object

To compile the GoogLeNet network, run the compile function of the dlhdl.Workflow object.

dn = hW.compile

### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream /home/bamorin/Documents/MATLAB/ExampleManager/bamorin.InvestigateFailures/deeplearning_shared-ex40925327/dlprocessor.sof.
### An output layer called 'Output1_prob' of type 'nnet.cnn.layer.RegressionOutputLayer' has been added to the provided network. This layer performs no operation during prediction and thus does not affect the output of the network.
### The network includes the following layers:

### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'Output1_prob' with type 'nnet.cnn.layer.RegressionOutputLayer' is implemented in software.
### Compiling layer group: conv1-7x7_s2>>pool2-3x3_s2 ...
### Compiling layer group: conv1-7x7_s2>>pool2-3x3_s2 ... complete.
### Compiling layer group: inception_3a-1x1>>inception_3a-relu_1x1 ...
### Compiling layer group: inception_3a-1x1>>inception_3a-relu_1x1 ... complete.
### Compiling layer group: inception_3a-3x3_reduce>>inception_3a-relu_3x3 ...
### Compiling layer group: inception_3a-3x3_reduce>>inception_3a-relu_3x3 ... complete.
### Compiling layer group: inception_3a-5x5_reduce>>inception_3a-relu_5x5 ...
### Compiling layer group: inception_3a-5x5_reduce>>inception_3a-relu_5x5 ... complete.
### Compiling layer group: inception_3a-pool>>inception_3a-relu_pool_proj ...
### Compiling layer group: inception_3a-pool>>inception_3a-relu_pool_proj ... complete.
### Compiling layer group: inception_3b-1x1>>inception_3b-relu_1x1 ...
### Compiling layer group: inception_3b-1x1>>inception_3b-relu_1x1 ... complete.
### Compiling layer group: inception_3b-3x3_reduce>>inception_3b-relu_3x3 ...
### Compiling layer group: inception_3b-3x3_reduce>>inception_3b-relu_3x3 ... complete.
### Compiling layer group: inception_3b-5x5_reduce>>inception_3b-relu_5x5 ...
### Compiling layer group: inception_3b-5x5_reduce>>inception_3b-relu_5x5 ... complete.
### Compiling layer group: inception_3b-pool>>inception_3b-relu_pool_proj ...
### Compiling layer group: inception_3b-pool>>inception_3b-relu_pool_proj ... complete.
### Compiling layer group: pool3-3x3_s2 ...
### Compiling layer group: pool3-3x3_s2 ... complete.
### Compiling layer group: inception_4a-1x1>>inception_4a-relu_1x1 ...
### Compiling layer group: inception_4a-1x1>>inception_4a-relu_1x1 ... complete.
### Compiling layer group: inception_4a-3x3_reduce>>inception_4a-relu_3x3 ...
### Compiling layer group: inception_4a-3x3_reduce>>inception_4a-relu_3x3 ... complete.
### Compiling layer group: inception_4a-5x5_reduce>>inception_4a-relu_5x5 ...
### Compiling layer group: inception_4a-5x5_reduce>>inception_4a-relu_5x5 ... complete.
### Compiling layer group: inception_4a-pool>>inception_4a-relu_pool_proj ...
### Compiling layer group: inception_4a-pool>>inception_4a-relu_pool_proj ... complete.
### Compiling layer group: inception_4b-1x1>>inception_4b-relu_1x1 ...
### Compiling layer group: inception_4b-1x1>>inception_4b-relu_1x1 ... complete.
### Compiling layer group: inception_4b-3x3_reduce>>inception_4b-relu_3x3 ...
### Compiling layer group: inception_4b-3x3_reduce>>inception_4b-relu_3x3 ... complete.
### Compiling layer group: inception_4b-5x5_reduce>>inception_4b-relu_5x5 ...
### Compiling layer group: inception_4b-5x5_reduce>>inception_4b-relu_5x5 ... complete.
### Compiling layer group: inception_4b-pool>>inception_4b-relu_pool_proj ...
### Compiling layer group: inception_4b-pool>>inception_4b-relu_pool_proj ... complete.
### Compiling layer group: inception_4c-1x1>>inception_4c-relu_1x1 ...
### Compiling layer group: inception_4c-1x1>>inception_4c-relu_1x1 ... complete.
### Compiling layer group: inception_4c-3x3_reduce>>inception_4c-relu_3x3 ...
### Compiling layer group: inception_4c-3x3_reduce>>inception_4c-relu_3x3 ... complete.
### Compiling layer group: inception_4c-5x5_reduce>>inception_4c-relu_5x5 ...
### Compiling layer group: inception_4c-5x5_reduce>>inception_4c-relu_5x5 ... complete.
### Compiling layer group: inception_4c-pool>>inception_4c-relu_pool_proj ...
### Compiling layer group: inception_4c-pool>>inception_4c-relu_pool_proj ... complete.
### Compiling layer group: inception_4d-1x1>>inception_4d-relu_1x1 ...
### Compiling layer group: inception_4d-1x1>>inception_4d-relu_1x1 ... complete.
### Compiling layer group: inception_4d-3x3_reduce>>inception_4d-relu_3x3 ...
### Compiling layer group: inception_4d-3x3_reduce>>inception_4d-relu_3x3 ... complete.
### Compiling layer group: inception_4d-5x5_reduce>>inception_4d-relu_5x5 ...
### Compiling layer group: inception_4d-5x5_reduce>>inception_4d-relu_5x5 ... complete.
### Compiling layer group: inception_4d-pool>>inception_4d-relu_pool_proj ...
### Compiling layer group: inception_4d-pool>>inception_4d-relu_pool_proj ... complete.
### Compiling layer group: inception_4e-1x1>>inception_4e-relu_1x1 ...
### Compiling layer group: inception_4e-1x1>>inception_4e-relu_1x1 ... complete.
### Compiling layer group: inception_4e-3x3_reduce>>inception_4e-relu_3x3 ...
### Compiling layer group: inception_4e-3x3_reduce>>inception_4e-relu_3x3 ... complete.
### Compiling layer group: inception_4e-5x5_reduce>>inception_4e-relu_5x5 ...
### Compiling layer group: inception_4e-5x5_reduce>>inception_4e-relu_5x5 ... complete.
### Compiling layer group: inception_4e-pool>>inception_4e-relu_pool_proj ...
### Compiling layer group: inception_4e-pool>>inception_4e-relu_pool_proj ... complete.
### Compiling layer group: pool4-3x3_s2 ...
### Compiling layer group: pool4-3x3_s2 ... complete.
### Compiling layer group: inception_5a-1x1>>inception_5a-relu_1x1 ...
### Compiling layer group: inception_5a-1x1>>inception_5a-relu_1x1 ... complete.
### Compiling layer group: inception_5a-3x3_reduce>>inception_5a-relu_3x3 ...
### Compiling layer group: inception_5a-3x3_reduce>>inception_5a-relu_3x3 ... complete.
### Compiling layer group: inception_5a-5x5_reduce>>inception_5a-relu_5x5 ...
### Compiling layer group: inception_5a-5x5_reduce>>inception_5a-relu_5x5 ... complete.
### Compiling layer group: inception_5a-pool>>inception_5a-relu_pool_proj ...
### Compiling layer group: inception_5a-pool>>inception_5a-relu_pool_proj ... complete.
### Compiling layer group: inception_5b-1x1>>inception_5b-relu_1x1 ...
### Compiling layer group: inception_5b-1x1>>inception_5b-relu_1x1 ... complete.
### Compiling layer group: inception_5b-3x3_reduce>>inception_5b-relu_3x3 ...
### Compiling layer group: inception_5b-3x3_reduce>>inception_5b-relu_3x3 ... complete.
### Compiling layer group: inception_5b-5x5_reduce>>inception_5b-relu_5x5 ...
### Compiling layer group: inception_5b-5x5_reduce>>inception_5b-relu_5x5 ... complete.
### Compiling layer group: inception_5b-pool>>inception_5b-relu_pool_proj ...
### Compiling layer group: inception_5b-pool>>inception_5b-relu_pool_proj ... complete.
### Compiling layer group: pool5-7x7_s1 ...
### Compiling layer group: pool5-7x7_s1 ... complete.
### Compiling layer group: newFC ...
### Compiling layer group: newFC ... complete.

### Allocating external memory buffers:

          offset_name          offset_address    allocated_space 
    _______________________    ______________    ________________

    "InputDataOffset"           "0x00000000"     "11.5 MB"       
    "OutputResultOffset"        "0x00b7c000"     "4.0 kB"        
    "SchedulerDataOffset"       "0x00b7d000"     "656.0 kB"      
    "SystemBufferOffset"        "0x00c21000"     "1.6 MB"        
    "InstructionDataOffset"     "0x00dae000"     "5.1 MB"        
    "ConvWeightDataOffset"      "0x012ca000"     "26.3 MB"       
    "FCWeightDataOffset"        "0x02d12000"     "20.0 kB"       
    "EndOffset"                 "0x02d17000"     "Total: 45.1 MB"

### Network compilation complete.

dn = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {}
             ddrInfo: [1×1 struct]
       resourceTable: [6×2 table]

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Intel Arria10 SoC hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. The function also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

hW.deploy

### Programming FPGA Bitstream using Ethernet...
### Attempting to connect to the hardware board at 172.21.89.235...
### Connection successful
### Programming FPGA device on Intel SoC hardware board at 172.21.89.235...
### Attempting to connect to the hardware board at 172.21.89.235...
### Connection successful
### Copying FPGA programming files to SD card...
### Setting FPGA bitstream and devicetree for boot...
WARNING: Uboot script u-boot.scr detected, this may override boot settings
# Copying Bitstream dlprocessor.core.rbf to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/dlprocessor.core.rbf
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'LIBIIO CNN system with 3 AXI4 Master'
### Rebooting Intel SoC at 172.21.89.235...
### Reboot may take several seconds...
### Attempting to connect to the hardware board at 172.21.89.235...
### Connection successful
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 30-Aug-2024 17:20:42
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 30-Aug-2024 17:20:42

Load Example Image

I = imresize(readimage(imdsValidation,1),[224 224]);
figure
imshow(I)

Retrieve Image Prediction

Execute the predict function of the dlhdl.Workflow object and display the prediction results.

I = dlarray(single(I), 'SSCB');
[prediction, ~] = hW.predict(I,'Profile','off');

### Finished writing input activations.
### Running single input activation.

label = scores2label(prediction,categories(imdsTrain.Labels))

label = categorical
     MathWorks Cap

title(string(label));

Retrieve Deployed Network Performance

View the performance of the deployed network by using the predict method with the Profile argument set to on.

[~, speed] = hW.predict(I,'Profile','on')

### Finished writing input activations.
### Running single input activation.


              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   15759503                  0.10506                       1           15768622              9.5
    conv1-7x7_s2           1140266                  0.00760 
    pool1-3x3_s2            226586                  0.00151 
    pool1-norm1             311786                  0.00208 
    conv2-3x3_reduce        278554                  0.00186 
    conv2-3x3               826869                  0.00551 
    conv2-norm2             951786                  0.00635 
    pool2-3x3_s2            208085                  0.00139 
    inception_3a-1x1        198671                  0.00132 
    inception_3a-3x3_reduce    282091                  0.00188 
    inception_3a-3x3        196937                  0.00131 
    inception_3a-5x5_reduce     74040                  0.00049 
    inception_3a-5x5         35333                  0.00024 
    inception_3a-pool        94565                  0.00063 
    inception_3a-pool_proj    115441                  0.00077 
    inception_3b-1x1        653551                  0.00436 
    inception_3b-3x3_reduce    654292                  0.00436 
    inception_3b-3x3        368280                  0.00246 
    inception_3b-5x5_reduce    217445                  0.00145 
    inception_3b-5x5        179163                  0.00119 
    inception_3b-pool       179780                  0.00120 
    inception_3b-pool_proj    362604                  0.00242 
    pool3-3x3_s2            196154                  0.00131 
    inception_4a-1x1        336707                  0.00224 
    inception_4a-3x3_reduce    184070                  0.00123 
    inception_4a-3x3         84818                  0.00057 
    inception_4a-5x5_reduce     56369                  0.00038 
    inception_4a-5x5         14416                  0.00010 
    inception_4a-pool        77393                  0.00052 
    inception_4a-pool_proj    132997                  0.00089 
    inception_4b-1x1        303833                  0.00203 
    inception_4b-3x3_reduce    222853                  0.00149 
    inception_4b-3x3        103657                  0.00069 
    inception_4b-5x5_reduce     73094                  0.00049 
    inception_4b-5x5         25836                  0.00017 
    inception_4b-pool        82589                  0.00055 
    inception_4b-pool_proj    141626                  0.00094 
    inception_4c-1x1        249712                  0.00166 
    inception_4c-3x3_reduce    249732                  0.00166 
    inception_4c-3x3        130976                  0.00087 
    inception_4c-5x5_reduce     73554                  0.00049 
    inception_4c-5x5         26077                  0.00017 
    inception_4c-pool        82597                  0.00055 
    inception_4c-pool_proj    140968                  0.00094 
    inception_4d-1x1        223088                  0.00149 
    inception_4d-3x3_reduce    277099                  0.00185 
    inception_4d-3x3        161609                  0.00108 
    inception_4d-5x5_reduce     86784                  0.00058 
    inception_4d-5x5         32811                  0.00022 
    inception_4d-pool        83197                  0.00055 
    inception_4d-pool_proj    141023                  0.00094 
    inception_4e-1x1        480969                  0.00321 
    inception_4e-3x3_reduce    313328                  0.00209 
    inception_4e-3x3        195874                  0.00131 
    inception_4e-5x5_reduce     90182                  0.00060 
    inception_4e-5x5         63293                  0.00042 
    inception_4e-pool        85193                  0.00057 
    inception_4e-pool_proj    256940                  0.00171 
    pool4-3x3_s2             82528                  0.00055 
    inception_5a-1x1        397644                  0.00265 
    inception_5a-3x3_reduce    257289                  0.00172 
    inception_5a-3x3        102006                  0.00068 
    inception_5a-5x5_reduce     70980                  0.00047 
    inception_5a-5x5         33279                  0.00022 
    inception_5a-pool        54257                  0.00036 
    inception_5a-pool_proj    210584                  0.00140 
    inception_5b-1x1        583897                  0.00389 
    inception_5b-3x3_reduce    303907                  0.00203 
    inception_5b-3x3        143676                  0.00096 
    inception_5b-5x5_reduce     94133                  0.00063 
    inception_5b-5x5         47694                  0.00032 
    inception_5b-pool        54081                  0.00036 
    inception_5b-pool_proj    211324                  0.00141 
    pool5-7x7_s1             72201                  0.00048 
    newFC                     1894                  0.00001 
 * The clock frequency of the DL processor is: 150MHz

speed=75×5 table
                                   Latency(cycles)    Latency(seconds)    NumFrames    Total Latency(cycles)    Frame/s 
                                   _______________    ________________    _________    _____________________    ________

    Network                           1.576e+07             0.10506          "1"            "15768622"          "9.5126"
    ____conv1-7x7_s2                 1.1403e+06           0.0076018          ""             ""                  ""      
    ____pool1-3x3_s2                 2.2659e+05           0.0015106          ""             ""                  ""      
    ____pool1-norm1                  3.1179e+05           0.0020786          ""             ""                  ""      
    ____conv2-3x3_reduce             2.7855e+05            0.001857          ""             ""                  ""      
    ____conv2-3x3                    8.2687e+05           0.0055125          ""             ""                  ""      
    ____conv2-norm2                  9.5179e+05           0.0063452          ""             ""                  ""      
    ____pool2-3x3_s2                 2.0808e+05           0.0013872          ""             ""                  ""      
    ____inception_3a-1x1             1.9867e+05           0.0013245          ""             ""                  ""      
    ____inception_3a-3x3_reduce      2.8209e+05           0.0018806          ""             ""                  ""      
    ____inception_3a-3x3             1.9694e+05           0.0013129          ""             ""                  ""      
    ____inception_3a-5x5_reduce           74040           0.0004936          ""             ""                  ""      
    ____inception_3a-5x5                  35333          0.00023555          ""             ""                  ""      
    ____inception_3a-pool                 94565          0.00063043          ""             ""                  ""      
    ____inception_3a-pool_proj       1.1544e+05          0.00076961          ""             ""                  ""      
    ____inception_3b-1x1             6.5355e+05            0.004357          ""             ""                  ""      
      ⋮

The speed table contains the latency information for every layer, total network latency, and the overall network performance in frames per second (FPS). For more information, see Profile Inference Run.