Running an Embedded Application on the NVIDIA Jetson TX2 Developer Kit

This example shows how to generate CUDA® code from a SeriesNetwork object and target the NVIDIA® TX2 board with an external camera. This example uses the AlexNet deep learning network to classify images from a USB webcam video stream.


  • Deep Learning Toolbox™ to load the SeriesNetwork object.

  • GPU Coder™ for generating CUDA code.

  • GPU Coder Interface for Deep Learning Libraries support package. To install this support package, use the Add-On Explorer.

  • NVIDIA Jetson TX2 developer kit.

  • USB camera to connect to the TX2.

  • NVIDIA CUDA toolkit installed on the TX2.

  • NVIDIA cuDNN 5.0 library installed on the TX2.

  • OpenCV 3.3.0 libraries for video read and image display operations installed on the TX2.

  • OpenCV header and library files must be in the NVCC compiler search path of the TX2.

  • Environment variables for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products (GPU Coder). For setting up the environment variables, see Setting Up the Prerequisite Products (GPU Coder).

  • This example is supported on the Linux® platform only.

Verify the GPU Environment for Target Hardware

Use the coder.checkGpuInstall function and verify that the compilers and libraries needed for running this example are set up correctly.


Get the Pretrained SeriesNetwork

AlexNet contains 25 layers including convolution, fully connected and the classification output layers.

net = getAlexnet();
ans = 

  25x1 Layer array with layers:

     1   'data'     Image Input                   227x227x3 images with 'zerocenter' normalization
     2   'conv1'    Convolution                   96 11x11x3 convolutions with stride [4  4] and padding [0  0  0  0]
     3   'relu1'    ReLU                          ReLU
     4   'norm1'    Cross Channel Normalization   cross channel normalization with 5 channels per element
     5   'pool1'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
     6   'conv2'    Convolution                   256 5x5x48 convolutions with stride [1  1] and padding [2  2  2  2]
     7   'relu2'    ReLU                          ReLU
     8   'norm2'    Cross Channel Normalization   cross channel normalization with 5 channels per element
     9   'pool2'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
    10   'conv3'    Convolution                   384 3x3x256 convolutions with stride [1  1] and padding [1  1  1  1]
    11   'relu3'    ReLU                          ReLU
    12   'conv4'    Convolution                   384 3x3x192 convolutions with stride [1  1] and padding [1  1  1  1]
    13   'relu4'    ReLU                          ReLU
    14   'conv5'    Convolution                   256 3x3x192 convolutions with stride [1  1] and padding [1  1  1  1]
    15   'relu5'    ReLU                          ReLU
    16   'pool5'    Max Pooling                   3x3 max pooling with stride [2  2] and padding [0  0  0  0]
    17   'fc6'      Fully Connected               4096 fully connected layer
    18   'relu6'    ReLU                          ReLU
    19   'drop6'    Dropout                       50% dropout
    20   'fc7'      Fully Connected               4096 fully connected layer
    21   'relu7'    ReLU                          ReLU
    22   'drop7'    Dropout                       50% dropout
    23   'fc8'      Fully Connected               1000 fully connected layer
    24   'prob'     Softmax                       softmax
    25   'output'   Classification Output         crossentropyex with 'tench' and 999 other classes

Generate Code for the SeriesNetwork

Generate code for the TX2 platform.

cfg = coder.gpuConfig('lib');
cfg.DeepLearningConfig = coder.DeepLearningConfig('cudnn');
cfg.GenerateReport = true;
cfg.TargetLang = 'C++';
cfg.Toolchain = 'NVIDIA CUDA for Jetson Tegra X2 | gmake (64-bit Linux)';
cfg.HardwareImplementation.TargetHWDeviceType = 'Generic->Custom';

codegen -config cfg -args {ones(227,227,3,'single'), coder.Constant('alexnet.mat')} alexnet_test.m
Code generation successful: To view the report, open('codegen/lib/alexnet_test/html/report.mldatx').

Generated Code Description

The generated code is compiled into a static library alexnet_test.a. The generated code includes code for entry-point design file, network classes, and binary weight files containing the network coefficients.

dir(fullfile('codegen', 'lib', 'alexnet_test'))
.                             alexnet_test_terminate.h      
..                            alexnet_test_terminate.o        alexnet_test_types.h          
DeepLearningNetwork.d         buildInfo.mat                 
DeepLearningNetwork.h         cnn_alexnet_avg               
DeepLearningNetwork.o         cnn_alexnet_conv1_b               cnn_alexnet_conv1_w           
MWCNNLayerImpl.d              cnn_alexnet_conv2_b           
MWCNNLayerImpl.hpp            cnn_alexnet_conv2_w           
MWCNNLayerImpl.o              cnn_alexnet_conv3_b             cnn_alexnet_conv3_w           
MWCudaDimUtility.d            cnn_alexnet_conv4_b           
MWCudaDimUtility.h            cnn_alexnet_conv4_w           
MWCudaDimUtility.o            cnn_alexnet_conv5_b           
MWFusedConvReLULayer.cpp      cnn_alexnet_conv5_w           
MWFusedConvReLULayer.d        cnn_alexnet_fc6_b             
MWFusedConvReLULayer.hpp      cnn_alexnet_fc6_w             
MWFusedConvReLULayer.o        cnn_alexnet_fc7_b       cnn_alexnet_fc7_w             
MWFusedConvReLULayerImpl.d    cnn_alexnet_fc8_b             
MWFusedConvReLULayerImpl.hpp  cnn_alexnet_fc8_w             
MWFusedConvReLULayerImpl.o    cnn_alexnet_labels.txt        cnn_api.cpp                   
MWTargetNetworkImpl.d         cnn_api.d                     
MWTargetNetworkImpl.hpp       cnn_api.hpp                   
MWTargetNetworkImpl.o         cnn_api.o                     
alexnet_test.a                codeInfo.mat                        examples                      
alexnet_test.d                gpu_codegen_info.mat          
alexnet_test.h                html                          
alexnet_test.o                interface                                
alexnet_test_initialize.d     predict.d                     
alexnet_test_initialize.h     predict.h                     
alexnet_test_initialize.o     predict.o                     
alexnet_test_ref.rsp          rtw_proj.tmw                    rtwtypes.h                

Main File

The custom main file creates and sets up the network object with layers and weights. It uses the OpenCV VideoCapture method to read frames from a camera connected to the TX2. Each frame is processed and classified, until no more frames are to be read.

edit(fullfile(matlabroot,'examples','deeplearning_shared','main', ''));

Copy Files to the Codegen Folder

% Copy the files required for the executable.

copyfile('', fullfile('codegen', 'lib', 'alexnet_test', ''));
copyfile('synsetWords.txt', fullfile('codegen', 'lib', 'alexnet_test', 'synsetWords.txt'));
copyfile('', fullfile('codegen', 'lib', 'alexnet_test', ''));

Build and Run on Target Hardware

Copy the contents of codegen folder to a location on the TX2.

scp -r ./codegen/lib/alexnet_test username@jetson-tx2-name:/path/to/desired/location

On the TX2, navigate to the copied codegen folder and execute the following commands.

sudo ~/

The script is used to boost TX2 performance.

Run make to generate an executable using the main file, the static library alexnet_test.a, and OpenCV libraries.

make -f

Run the executable on the TX2 platform with a device number for your webcam.

./alexnet_exe 1

This command displays a live video feed from the webcam accompanied by the AlexNet predictions of the current image. Press escape at any time to quit.

AlexNet Classification Output on TX2

Related Topics