Main Content

Image Category Classification by Using Deep Learning

This example shows you how to create, compile, and deploy a dlhdl.Workflow object with ResNet-18 as the network object by using the Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC. Use MATLAB® to retrieve the prediction results from the target device. ResNet-18 is a pretrained convolutional neural network that has been trained on over a million images and can classify images into 1000 object categories (such as keyboard, coffee, mug, pencil,and many animals). You can also use VGG-19 and DarkNet-19 as the network objects.

Prerequisites

  • Xilinx ZCU102 SoC Development Kit

  • Deep Learning HDL Toolbox™ Support Package for Xilinx FPGA and SoC

  • Deep Learning Toolbox™ Model for ResNet-18 Network

  • Deep Learning Toolbox™

  • Deep Learning HDL Toolbox™

Load the Pretrained Network

To load the pretrained Directed Acyclic Graph (DAG) network resnet18, enter:

net = resnet18;

To load the pretrained series network vgg19, enter:

% net = vgg19;

To load the pretrained series network darknet19, enter:

% net = darknet19;

The pretrained ResNet-18 network contains 71 layers including the input, convolution, batch normalization, ReLU, max pooling, addition, global average pooling, fully connected, and the softmax layers. To view the layers of the pretrained ResNet-18 network, enter:

analyzeNetwork(net)

Create Target Object

Use the dlhdl.Target class to create a target object with a custom name for your target device and an interface to connect your target device to the host computer. Interface options are JTAG and Ethernet. To use JTAG,Install Xilinx™ Vivado™ Design Suite 2019.2. To set the Xilinx Vivado toolpath, enter:

% hdlsetuptoolpath('ToolName', 'Xilinx Vivado', 'ToolPath', 'C:\Xilinx\Vivado\2019.2\bin\vivado.bat');
hTarget = dlhdl.Target('Xilinx', 'Interface', 'Ethernet');

Create WorkFlow Object

Use the dlhdl.Workflow class to create an object. When you create the object, specify the network and the bitstream name. Specify the saved pretrained ResNet-18 neural network as the network. Make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx ZCU102 SoC board. The bitstream uses a single data type.

hW = dlhdl.Workflow('Network', net, 'Bitstream', 'zcu102_single', 'Target', hTarget);

Compile the ResNet-18 DAG network

To compile the ResNet-18 DAG network, run the compile method of the dlhdl.Workflow object. You can optionally specify the maximum number of input frames. You can also optionally specify the input image normalization to happen in software.

dn = compile(hW, 'InputFrameNumberLimit', 15, 'HardwareNormalization', 'off')
### Compiling network for Deep Learning FPGA prototyping ...
### Targeting FPGA bitstream zcu102_single.
### The network includes the following layers:
     1   'data'                              Image Input                  224×224×3 images with 'zscore' normalization                          (SW Layer)
     2   'conv1'                             Convolution                  64 7×7×3 convolutions with stride [2  2] and padding [3  3  3  3]     (HW Layer)
     3   'bn_conv1'                          Batch Normalization          Batch normalization with 64 channels                                  (HW Layer)
     4   'conv1_relu'                        ReLU                         ReLU                                                                  (HW Layer)
     5   'pool1'                             Max Pooling                  3×3 max pooling with stride [2  2] and padding [1  1  1  1]           (HW Layer)
     6   'res2a_branch2a'                    Convolution                  64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
     7   'bn2a_branch2a'                     Batch Normalization          Batch normalization with 64 channels                                  (HW Layer)
     8   'res2a_branch2a_relu'               ReLU                         ReLU                                                                  (HW Layer)
     9   'res2a_branch2b'                    Convolution                  64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
    10   'bn2a_branch2b'                     Batch Normalization          Batch normalization with 64 channels                                  (HW Layer)
    11   'res2a'                             Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    12   'res2a_relu'                        ReLU                         ReLU                                                                  (HW Layer)
    13   'res2b_branch2a'                    Convolution                  64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
    14   'bn2b_branch2a'                     Batch Normalization          Batch normalization with 64 channels                                  (HW Layer)
    15   'res2b_branch2a_relu'               ReLU                         ReLU                                                                  (HW Layer)
    16   'res2b_branch2b'                    Convolution                  64 3×3×64 convolutions with stride [1  1] and padding [1  1  1  1]    (HW Layer)
    17   'bn2b_branch2b'                     Batch Normalization          Batch normalization with 64 channels                                  (HW Layer)
    18   'res2b'                             Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    19   'res2b_relu'                        ReLU                         ReLU                                                                  (HW Layer)
    20   'res3a_branch2a'                    Convolution                  128 3×3×64 convolutions with stride [2  2] and padding [1  1  1  1]   (HW Layer)
    21   'bn3a_branch2a'                     Batch Normalization          Batch normalization with 128 channels                                 (HW Layer)
    22   'res3a_branch2a_relu'               ReLU                         ReLU                                                                  (HW Layer)
    23   'res3a_branch2b'                    Convolution                  128 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    24   'bn3a_branch2b'                     Batch Normalization          Batch normalization with 128 channels                                 (HW Layer)
    25   'res3a'                             Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    26   'res3a_relu'                        ReLU                         ReLU                                                                  (HW Layer)
    27   'res3a_branch1'                     Convolution                  128 1×1×64 convolutions with stride [2  2] and padding [0  0  0  0]   (HW Layer)
    28   'bn3a_branch1'                      Batch Normalization          Batch normalization with 128 channels                                 (HW Layer)
    29   'res3b_branch2a'                    Convolution                  128 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    30   'bn3b_branch2a'                     Batch Normalization          Batch normalization with 128 channels                                 (HW Layer)
    31   'res3b_branch2a_relu'               ReLU                         ReLU                                                                  (HW Layer)
    32   'res3b_branch2b'                    Convolution                  128 3×3×128 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    33   'bn3b_branch2b'                     Batch Normalization          Batch normalization with 128 channels                                 (HW Layer)
    34   'res3b'                             Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    35   'res3b_relu'                        ReLU                         ReLU                                                                  (HW Layer)
    36   'res4a_branch2a'                    Convolution                  256 3×3×128 convolutions with stride [2  2] and padding [1  1  1  1]  (HW Layer)
    37   'bn4a_branch2a'                     Batch Normalization          Batch normalization with 256 channels                                 (HW Layer)
    38   'res4a_branch2a_relu'               ReLU                         ReLU                                                                  (HW Layer)
    39   'res4a_branch2b'                    Convolution                  256 3×3×256 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    40   'bn4a_branch2b'                     Batch Normalization          Batch normalization with 256 channels                                 (HW Layer)
    41   'res4a'                             Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    42   'res4a_relu'                        ReLU                         ReLU                                                                  (HW Layer)
    43   'res4a_branch1'                     Convolution                  256 1×1×128 convolutions with stride [2  2] and padding [0  0  0  0]  (HW Layer)
    44   'bn4a_branch1'                      Batch Normalization          Batch normalization with 256 channels                                 (HW Layer)
    45   'res4b_branch2a'                    Convolution                  256 3×3×256 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    46   'bn4b_branch2a'                     Batch Normalization          Batch normalization with 256 channels                                 (HW Layer)
    47   'res4b_branch2a_relu'               ReLU                         ReLU                                                                  (HW Layer)
    48   'res4b_branch2b'                    Convolution                  256 3×3×256 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    49   'bn4b_branch2b'                     Batch Normalization          Batch normalization with 256 channels                                 (HW Layer)
    50   'res4b'                             Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    51   'res4b_relu'                        ReLU                         ReLU                                                                  (HW Layer)
    52   'res5a_branch2a'                    Convolution                  512 3×3×256 convolutions with stride [2  2] and padding [1  1  1  1]  (HW Layer)
    53   'bn5a_branch2a'                     Batch Normalization          Batch normalization with 512 channels                                 (HW Layer)
    54   'res5a_branch2a_relu'               ReLU                         ReLU                                                                  (HW Layer)
    55   'res5a_branch2b'                    Convolution                  512 3×3×512 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    56   'bn5a_branch2b'                     Batch Normalization          Batch normalization with 512 channels                                 (HW Layer)
    57   'res5a'                             Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    58   'res5a_relu'                        ReLU                         ReLU                                                                  (HW Layer)
    59   'res5a_branch1'                     Convolution                  512 1×1×256 convolutions with stride [2  2] and padding [0  0  0  0]  (HW Layer)
    60   'bn5a_branch1'                      Batch Normalization          Batch normalization with 512 channels                                 (HW Layer)
    61   'res5b_branch2a'                    Convolution                  512 3×3×512 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    62   'bn5b_branch2a'                     Batch Normalization          Batch normalization with 512 channels                                 (HW Layer)
    63   'res5b_branch2a_relu'               ReLU                         ReLU                                                                  (HW Layer)
    64   'res5b_branch2b'                    Convolution                  512 3×3×512 convolutions with stride [1  1] and padding [1  1  1  1]  (HW Layer)
    65   'bn5b_branch2b'                     Batch Normalization          Batch normalization with 512 channels                                 (HW Layer)
    66   'res5b'                             Addition                     Element-wise addition of 2 inputs                                     (HW Layer)
    67   'res5b_relu'                        ReLU                         ReLU                                                                  (HW Layer)
    68   'pool5'                             2-D Global Average Pooling   2-D global average pooling                                            (HW Layer)
    69   'fc1000'                            Fully Connected              1000 fully connected layer                                            (HW Layer)
    70   'prob'                              Softmax                      softmax                                                               (HW Layer)
    71   'ClassificationLayer_predictions'   Classification Output        crossentropyex with 'tench' and 999 other classes                     (SW Layer)
                                                                                                                                              
### Optimizing network: Fused 'nnet.cnn.layer.BatchNormalizationLayer' into 'nnet.cnn.layer.Convolution2DLayer'
### Notice: The layer 'data' with type 'nnet.cnn.layer.ImageInputLayer' is implemented in software.
### Notice: The layer 'prob' with type 'nnet.cnn.layer.SoftmaxLayer' is implemented in software.
### Notice: The layer 'ClassificationLayer_predictions' with type 'nnet.cnn.layer.ClassificationOutputLayer' is implemented in software.
### Compiling layer group: conv1>>pool1 ...
### Compiling layer group: conv1>>pool1 ... complete.
### Compiling layer group: res2a_branch2a>>res2a_branch2b ...
### Compiling layer group: res2a_branch2a>>res2a_branch2b ... complete.
### Compiling layer group: res2b_branch2a>>res2b_branch2b ...
### Compiling layer group: res2b_branch2a>>res2b_branch2b ... complete.
### Compiling layer group: res3a_branch1 ...
### Compiling layer group: res3a_branch1 ... complete.
### Compiling layer group: res3a_branch2a>>res3a_branch2b ...
### Compiling layer group: res3a_branch2a>>res3a_branch2b ... complete.
### Compiling layer group: res3b_branch2a>>res3b_branch2b ...
### Compiling layer group: res3b_branch2a>>res3b_branch2b ... complete.
### Compiling layer group: res4a_branch1 ...
### Compiling layer group: res4a_branch1 ... complete.
### Compiling layer group: res4a_branch2a>>res4a_branch2b ...
### Compiling layer group: res4a_branch2a>>res4a_branch2b ... complete.
### Compiling layer group: res4b_branch2a>>res4b_branch2b ...
### Compiling layer group: res4b_branch2a>>res4b_branch2b ... complete.
### Compiling layer group: res5a_branch1 ...
### Compiling layer group: res5a_branch1 ... complete.
### Compiling layer group: res5a_branch2a>>res5a_branch2b ...
### Compiling layer group: res5a_branch2a>>res5a_branch2b ... complete.
### Compiling layer group: res5b_branch2a>>res5b_branch2b ...
### Compiling layer group: res5b_branch2a>>res5b_branch2b ... complete.
### Compiling layer group: pool5 ...
### Compiling layer group: pool5 ... complete.
### Compiling layer group: fc1000 ...
### Compiling layer group: fc1000 ... complete.

### Allocating external memory buffers:

          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "12.0 MB"        
    "OutputResultOffset"        "0x00c00000"     "4.0 MB"         
    "SchedulerDataOffset"       "0x01000000"     "4.0 MB"         
    "SystemBufferOffset"        "0x01400000"     "28.0 MB"        
    "InstructionDataOffset"     "0x03000000"     "4.0 MB"         
    "ConvWeightDataOffset"      "0x03400000"     "52.0 MB"        
    "FCWeightDataOffset"        "0x06800000"     "4.0 MB"         
    "EndOffset"                 "0x06c00000"     "Total: 108.0 MB"

### Network compilation complete.
dn = struct with fields:
             weights: [1×1 struct]
        instructions: [1×1 struct]
           registers: [1×1 struct]
    syncInstructions: [1×1 struct]
        constantData: {}

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx ZCU102 hardware, run the deploy function of the dlhdl.Workflow object. This function uses the output of the compile function to program the FPGA board by using the programming file. It also downloads the network weights and biases. The deploy function starts programming the FPGA device, displays progress messages, and the time it takes to deploy the network.

deploy(hW)
### Programming FPGA Bitstream using Ethernet...
Downloading target FPGA device configuration over Ethernet to SD card ...
# Copied /tmp/hdlcoder_rd to /mnt/hdlcoder_rd
# Copying Bitstream hdlcoder_system.bit to /mnt/hdlcoder_rd
# Set Bitstream to hdlcoder_rd/hdlcoder_system.bit
# Copying Devicetree devicetree_dlhdl.dtb to /mnt/hdlcoder_rd
# Set Devicetree to hdlcoder_rd/devicetree_dlhdl.dtb
# Set up boot for Reference Design: 'AXI-Stream DDR Memory Access : 3-AXIM'

Downloading target FPGA device configuration over Ethernet to SD card done. The system will now reboot for persistent changes to take effect.


System is rebooting . . . . . .
### Programming the FPGA bitstream has been completed successfully.
### Loading weights to Conv Processor.
### Conv Weights loaded. Current time is 10-Dec-2021 16:01:37
### Loading weights to FC Processor.
### FC Weights loaded. Current time is 10-Dec-2021 16:01:37

Load Image for Prediction

Load the example image.

imgFile = 'espressomaker.jpg';
inputImg = imresize(imread(imgFile), [224,224]);
imshow(inputImg)

Run Prediction for One Image

Execute the predict method on the dlhdl.Workflow object and then show the label in the MATLAB command window.

[prediction, speed] = predict(hW,single(inputImg),'Profile','on');
### Finished writing input activations.
### Running single input activation.

              Deep Learning Processor Profiler Performance Results

                   LastFrameLatency(cycles)   LastFrameLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   24100982                  0.10955                       1           24103448              9.1
    conv1                  2225590                  0.01012 
    pool1                   577207                  0.00262 
    res2a_branch2a          973263                  0.00442 
    res2a_branch2b          973083                  0.00442 
    res2a                   307582                  0.00140 
    res2b_branch2a          973221                  0.00442 
    res2b_branch2b          973548                  0.00443 
    res2b                   307602                  0.00140 
    res3a_branch1           541072                  0.00246 
    res3a_branch2a          749668                  0.00341 
    res3a_branch2b          908194                  0.00413 
    res3a                   153885                  0.00070 
    res3b_branch2a          908013                  0.00413 
    res3b_branch2b          907705                  0.00413 
    res3b                   153935                  0.00070 
    res4a_branch1           491540                  0.00223 
    res4a_branch2a          491680                  0.00223 
    res4a_branch2b          889776                  0.00404 
    res4a                    77044                  0.00035 
    res4b_branch2a          889897                  0.00404 
    res4b_branch2b          889873                  0.00404 
    res4b                    77053                  0.00035 
    res5a_branch1          1057762                  0.00481 
    res5a_branch2a         1057907                  0.00481 
    res5a_branch2b         2058997                  0.00936 
    res5a                    38602                  0.00018 
    res5b_branch2a         2058860                  0.00936 
    res5b_branch2b         2059549                  0.00936 
    res5b                    38704                  0.00018 
    pool5                    73721                  0.00034 
    fc1000                  216262                  0.00098 
 * The clock frequency of the DL processor is: 220MHz
[val, idx] = max(prediction);
net.Layers(end).ClassNames{idx}
ans = 
'Polaroid camera'

Run Prediction for Multiple Images

Load multiple images and retrieve their prediction reults by using the mulltiple frame support feature. For more information, see Multiple Frame Support.

The demoOnImage function loads multiple images and retrieves their prediction results. The annotateresults function displays the image prediction result on top of the images which are assembled into a 3-by-5 array.

imshow(inputImg)

demoOnImage; 
### Finished writing input activations.
### Running in multi-frame mode with 15 inputs.
FPGA PREDICTION: binder 
FPGA PREDICTION: file 
FPGA PREDICTION: barber chair 
FPGA PREDICTION: mixing bowl 
FPGA PREDICTION: washbasin 
FPGA PREDICTION: desk 
FPGA PREDICTION: envelope 
FPGA PREDICTION: Polaroid camera 
FPGA PREDICTION: typewriter keyboard 
FPGA PREDICTION: monitor 
FPGA PREDICTION: sunglass 
FPGA PREDICTION: ballpoint 
FPGA PREDICTION: can opener 
FPGA PREDICTION: analog clock 
FPGA PREDICTION: ashcan