Deploy Transfer Learning Network for Lane Detection

This example uses:

This example shows how to create, compile, and deploy a lane detection convolutional neural network (CNN) to an FPGA, and use MATLAB® to retrieve the prediction results.

Prerequisites

Xilinx ZCU102 SoC development kit

Load the Pretrained SeriesNetwork

Load the pretrained lanenet network.

snet = getLaneDetectionNetwork;

Normalize Input Layer

Normalize the input layer by modifying its type.

inputlayer = imageInputLayer(snet.Layers(1).InputSize,Normalization='none');
snet = SeriesNetwork([inputlayer;snet.Layers(2:end)]);

View the layers of the network by using the Deep Network Designer app.

deepNetworkDesigner(snet)

Define FPGA Board Interface

Define the target FPGA board programming interface by using the dlhdl.Target object. Create a programming interface with custom name for your target device and an Ethernet interface to connect the target device to the host computer.

hTarget = dlhdl.Target('Xilinx','Interface','Ethernet');

Generate Custom Bitstream to Deploy Network

The lane detection network consists of multiple cross-channel normalization layers. To support this layer on hardware, enable the LRNBlockGeneration property of the conv module in the bitstream that you need to use for FPGA inference. The zcu102_single bitstream does not have this property turned on. A new bitstream can be generated using the following lines of code. The generated bitstream can be used along with a dlhdl.Workflow object for inference.

When you create a dlhdl.ProcessorConfig object for a reference bitstream, make sure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example the target FPGA board is the Xilinx ZCU102 SoC board and the date type is single. Update the processor configuration with the LRNBlockGeneration property enabled and the SegmentationBlockGeneration property disabled. Disabling the SegmentationBlockGeneration property ensures that the Deep Learning IP fits on the FPGA and avoids overuse of resources. If targeting the Xilinx ZC706 board, replace 'zcu102_single' with 'zc706_single' in the first command.

hPC = dlhdl.ProcessorConfig('Bitstream', 'zcu102_single');
setModuleProperty(hPC,'conv', 'LRNBlockGeneration', 'on');
setModuleProperty(hPC,'conv', 'SegmentationBlockGeneration', 'off');

Generate a custom bitstream by using the dlhdl.buildProcessor function. To learn how to use the generated bitstream file, see Generate Custom Bitstream.

dlhdl.buildProcessor(hPC)

If targeting the Xilinx ZC706 board, replace 'zcu102_single' with 'zc706_single' in the first command above.

Prepare Network for Deployment

Prepare the network for deployment by creating a dlhdl.Workflow object. Specify the network and bitstream name. Ensure that the bitstream name matches the data type and the FPGA board that you are targeting. In this example, the target FPGA board is the Xilinx® Zynq® UltraScale+™ MPSoC ZCU102 board and the bitstream uses the single data type.

hW = dlhdl.Workflow(Network=snet,Bitstream='dlprocessor.bit',Target=hTarget);

Compile Network

Run the compile method of the dlhdl.Workflow object to compile the network and generate the instructions, weights, and biases for deployment.

dn = compile(hW);

          offset_name          offset_address     allocated_space 
    _______________________    ______________    _________________

    "InputDataOffset"           "0x00000000"     "24.0 MB"        
    "OutputResultOffset"        "0x01800000"     "4.0 MB"         
    "SystemBufferOffset"        "0x01c00000"     "28.0 MB"        
    "InstructionDataOffset"     "0x03800000"     "4.0 MB"         
    "ConvWeightDataOffset"      "0x03c00000"     "16.0 MB"        
    "FCWeightDataOffset"        "0x04c00000"     "148.0 MB"       
    "EndOffset"                 "0x0e000000"     "Total: 224.0 MB"

Program Bitstream onto FPGA and Download Network Weights

To deploy the network on the Xilinx® Zynq® UltraScale+ MPSoC ZCU102 hardware, run the deploy method of the dlhdl.Workflow object. This method programs the FPGA board using the output of the compile method and the programming file, downloads the network weights and biases, displays progress messages, and the time it takes to deploy the network.

deploy(hW)

### FPGA bitstream programming has been skipped as the same bitstream is already loaded on the target FPGA.
### Loading weights to FC Processor.
### 13% finished, current time is 28-Jun-2020 12:36:09.
### 25% finished, current time is 28-Jun-2020 12:36:10.
### 38% finished, current time is 28-Jun-2020 12:36:11.
### 50% finished, current time is 28-Jun-2020 12:36:12.
### 63% finished, current time is 28-Jun-2020 12:36:13.
### 75% finished, current time is 28-Jun-2020 12:36:14.
### 88% finished, current time is 28-Jun-2020 12:36:14.
### FC Weights loaded. Current time is 28-Jun-2020 12:36:15

Test Network

Run the demoOnVideo helper function. This function loads the example video, executes the predict method of the dlhdl.Workflow object, and then plots the result. See Helper Functions.

demoOnVideo(hW,1);

### Finished writing input activations.
### Running single input activations.


              Deep Learning Processor Profiler Performance Results

                   LastLayerLatency(cycles)   LastLayerLatency(seconds)       FramesNum      Total Latency     Frames/s
                         -------------             -------------              ---------        ---------       ---------
Network                   24904175                  0.11320                       1           24904217              8.8
    conv_module            8967009                  0.04076 
        conv1              1396633                  0.00635 
        norm1               623003                  0.00283 
        pool1               226855                  0.00103 
        conv2              3410044                  0.01550 
        norm2               378531                  0.00172 
        pool2               233635                  0.00106 
        conv3              1139419                  0.00518 
        conv4               892918                  0.00406 
        conv5               615897                  0.00280 
        pool5                50189                  0.00023 
    fc_module             15937166                  0.07244 
        fc6               15819257                  0.07191 
        fcLane1             117125                  0.00053 
        fcLane2                782                  0.00000 
 * The clock frequency of the DL processor is: 220MHz

Helper Functions

function demoOnVideo (hW, frameLimit)

if nargin < 2
    frameLimit = 1000000;
end

writeToFile = false;

videoFile = 'caltech_cordova1.avi'; 

if ~isfile(videoFile)
  	url = append('https://www.mathworks.com/supportfiles/gpucoder/media/', videoFile);
   	websave('caltech_cordova1.avi', url);
end

ss = getLaneDetectionData();

sensor = caltechMonoCamera();

%Initialize video readers and writers
vR = VideoReader(videoFile);
vPlayer = vision.DeployableVideoPlayer();

if writeToFile
    [~, name, ext] = fileparts(videoFile);
    outFileName = [name '_out' ext];
    vW = VideoWriter(outFileName);
    vW.FrameRate = vR.FrameRate;
    open(vW);
end

isOpen = true;

frameCount = 0;
while frameCount < frameLimit && isOpen && hasFrame(vR)
    testImg = readFrame(vR);
    inputImg = imresize(testImg, [227 227]);

%     profile off
    outputs = hW.predict(inputImg, 'Profile', 'on');

    laneim = showNetworkOutputs(testImg, outputs, ss.laneCoeffMeans, ss.laneCoeffsStds, sensor);
    step(vPlayer, laneim);
    frameCount = frameCount + 1;
    if writeToFile
        writeVideo(vW, laneim);
    end
    isOpen = vPlayer.isOpen();
end

if writeToFile
    close(vW);
end

release(vPlayer);
delete(vR);

end

function laneim = showNetworkOutputs(img, lanecoeffsNetworkOutput, laneCoeffMeans, laneCoeffStds, sensor)
%

params = lanecoeffsNetworkOutput .* laneCoeffStds + laneCoeffMeans;

isRightLaneFound = abs(params(6)) > 0.5; 
isLeftLaneFound =  abs(params(3)) > 0.5;

if isRightLaneFound
    rtBoundary = parabolicLaneBoundary(params(4:6));
else
    rtBoundary = parabolicLaneBoundary.empty(1, 0);
end

if isLeftLaneFound
    ltBoundary = parabolicLaneBoundary(params(1:3));
else
    ltBoundary = parabolicLaneBoundary.empty(1, 0);
end

laneboundaries = [ltBoundary, rtBoundary];

vehicleXPoints = 3:30; 
laneim = insertLaneBoundary(img,laneboundaries,sensor, vehicleXPoints, 'Color', 'green');

end