Cross Compile Deep Learning Code for ARM Neon Targets

This example uses:

This example shows how to cross-compile the generated deep learning code to create a library or an executable, and then deploy the library or executable on an ARM® target such as Hikey 960 or Rock 960. This example uses the codegen command.

Cross compiling the deep learning code for ARM® targets involves these steps:

Configure the installed cross-compiler toolchain to perform compilation on the host MATLAB®. The compilation happens when you run the codegen command in MATLAB in the host computer.
Use the codegen command to build the generated code and create a library or an executable on the host computer.
Copy the generated library or executable and other supporting files to the target hardware. If you generate a library on the host computer, compile the copied makefile on the target to create an executable.
Run the generated executable on the target ARM hardware.

You can use this workflow for any ARM Neon target that supports the Neon|SIMD instruction set. This example is supported only for host Linux® platforms.

Prerequisites

ARM processor that supports the Neon|SIMD extension
ARM Compute Library (on the host computer)
MATLAB® Coder™
The support package MATLAB Coder Interface for Deep Learning
Deep Learning Toolbox™
The support package Deep Learning Toolbox Model for Inception-v3 Network
Image Processing Toolbox™
For deployment on armv7 (32 bit Arm Architecture) target, GNU/GCC g++-arm-linux-gnueabihf toolchain
For deployment on armv8 (64 bit Arm Architecture) target, GNU/GCC g++-aarch64-linux-gnu toolchain
Environment variables for the cross compilers and libraries

For information about how to install the cross-compiler toolchain and set up the associated environment variable, see Cross-Compile Deep Learning Code That Uses ARM Compute Library.

The ARM Compute library version that this example uses might not be the latest version that code generation supports. For information about supported versions of libraries and about environment variables, see Prerequisites for Deep Learning with MATLAB Coder.

The code lines in this example are commented out. Uncomment them before you run the example.

This example in not supported in MATLAB Online.

The `inception_predict_arm` Entry-Point Function

This example uses the Inception-V3 image classification network. A pretrained Inception-V3 network for MATLAB is available in the support package Deep Learning Toolbox Model for Inception-V3 Network. The inception_predict_arm entry-point function loads the Inception-V3 network into a persistent network object. On subsequent calls to the function, the persistent object is reused.

type inception_predict_arm

function out = inception_predict_arm(in)

persistent net;
if isempty(net)
    net = coder.loadDeepLearningNetwork('inceptionv3','inceptionv3');
end

out = net.predict(in);

end

Set up a Deep Learning Configuration Object

Create a coder.ARMNEONConfig object. Specify the version of the ARM Compute library and arm architecture.

dlcfg = coder.DeepLearningConfig('arm-compute');
dlcfg.ArmComputeVersion = '20.02.1';
dlcfg.ArmArchitecture = 'armv8'; % or 'armv7'

For classifying the input image peppers.png, convert the image to a text file.

generateImagetoTxt('peppers.png');

First Approach: Create Static Library for Entry-Point Function on Host

In this approach, you first cross-compile the generated code to create a static library on the host computer. You then transfer the generated static library, the ARM Compute library files, the makefile, and other supporting files to the target hardware. You run the makefile on the target hardware to generate the executable. Finally, you run the executable on the target hardware.

Set Up a Code Generation Configuration Object

Create a code generation configuration object for a static library. Specify the target language as C++.

cfg = coder.config('lib');
cfg.TargetLang = 'C++';

Attach the deep learning configuration object to the code generation configuration object.

cfg.DeepLearningConfig = dlcfg;

Configure the Cross-Compiler Toolchain

Configure the cross-compiler toolchain based on the ARM Architecture of the target device.

cfg.Toolchain =  'Linaro AArch64 Linux v6.3.1';% When the Arm Architecture is armv8

cfg.Toolchain =  'Linaro AArch32 Linux v6.3.1';% When the Arm Architecture is armv7

Generate Static Library on Host Computer by Using codegen

Use the codegen command to generate code for the entry-point function, build the generated code, and create static library for the target ARM architecture.

codegen -config cfg inception_predict_arm -args {ones(299,299,3,'single')} -d arm_compute_cc_lib -report

Copy the Generated Cross-Compiled Static Library to Target hardware

Copy the static library, the bin files, and the header files from the generated folder arm_compute_cc_lib to the target ARM hardware. In this code line and other code lines that follow, replace:

password with your password
username with your username
hostname with the name of your device
targetDir with the destination folder for the files

system('sshpass -p password scp -r arm_compute_cc_lib/*.bin arm_compute_cc_lib/*.lib arm_compute_cc_lib/*.h arm_compute_cc_lib/*.hpp username@hostname:targetDir/');

Copy the ARM Compute Library Files to Target Hardware

The executable uses the ARM Compute library files during runtime. The target board does not need header files while generating the executable and running the executable. Copy the library to the desired path.

system(['sshpass -p password scp -r ' fullfile(getenv('ARM_COMPUTELIB'),'lib') ' username@hostname:targetDir/']);

Copy Supporting Files to Target Hardware

Copy these files to the target ARM hardware:

Makefile Makefile_Inceptionv3 to generate executable from static library.
Input Image inputimage.txt that you want to classify.
The text file synsetWords.txt that contains the ClassNames returned by net.Layers(end).Classes
The main wrapper file main_inception_arm.cpp that calls the code generated for the inception_predict_arm function.

system('sshpass -p password scp synsetWords.txt ./Makefile_Inceptionv3 ./inputimage.txt ./main_inception_arm.cpp username@hostname:targetDir/');

Create the Executable on the Target

Compile the makefile on the target to generate the executable from the static library. This makefile links the static library with the main wrapper file main_inception_arm.cpp and generates the executable.

system('sshpass -p password ssh username@hostname "make -C targetDir -f Makefile_Inceptionv3 arm_inceptionv3 "');

Run the Executable on the Target

Run the generated executable on the target. Make sure to export LD_LIBRARY_PATH that points to the ARM Compute library files while running executable.

system('sshpass -p password ssh username@hostname "export LD_LIBRARY_PATH=targetDir/lib; cd targetDir;./inception_predict_arm.elf inputimage.txt out.txt"');

Second Approach: Create Executable for Entry-Point function on Host

In this approach, you first cross-compile the generated code to create an executable on the host computer. You then transfer the generated executable, the ARM Compute library files, and other supporting files to the target hardware. Finally, you run the executable on the target hardware.

Set Up a Code Generation Configuration Object

Create a code generation configuration object for an generating an executable. Set the target language as C++.

cfg = coder.config('exe');
cfg.TargetLang = 'C++';

Attach the deep learning configuration object to the code generation configuration object.

cfg.DeepLearningConfig = dlcfg;

Declare the main wrapper file main_inception_arm.cpp as the custom source file.

cfg.CustomSource = 'main_inception_arm.cpp';

Configure the Cross-Compiler Toolchain

Configure the cross-compiler toolchain based on the ARM Architecture of the target device.

cfg.Toolchain =  'Linaro AArch64 Linux v6.3.1'; % When the Arm Architecture is armv8,

cfg.Toolchain =  'Linaro AArch32 Linux v6.3.1';% When the Arm Architecture is armv7,

Generate Executable on the Host Computer by Using `codegen`

Use the codegen command to generate code for the entry-point function, build the generated code, and create an executable for the target ARM architecture.

codegen -config cfg inception_predict_arm -args {ones(299,299,3,'single')} -d arm_compute_cc_exe -report

Copy the Generated Executable to the Target Hardware

Copy the generated executable and the bin files to the target ARM hardware. In this code line and other code lines that follow, replace:

password with your password
username with your username
hostname with the name of your device
targetDir with the destination folder for the files

system('sshpass -p password scp -r arm_compute_cc_exe/*.bin username@hostname:targetDir/');
system('sshpass -p password scp inception_predict_arm.elf username@hostname:targetDir/');

Copy the ARM Compute Library Files to the Target Hardware

The executable uses the ARM Compute library files during runtime. It does not use header files at runtime. Copy the library files to the desired path.

system(['sshpass -p password scp -r ' fullfile(getenv('ARM_COMPUTELIB'),'lib') ' username@hostname:targetDir/']);

Copy Supporting Files to the Target Hardware

Copy these files to the target ARM hardware:

Input Image inputimage.txt that you want to classify.
The text file synsetWords.txt that contains the ClassNames returned by net.Layers(end).Classes
The main wrapper file main_inception_arm.cpp that calls the code generated for the inception_predict_arm function.

system('sshpass -p password scp synsetWords.txt ./inputimage.txt ./main_inception_arm.cpp username@hostname:targetDir/');

Run the Executable on the Target Hardware

Run the generated executable on the target. Make sure to export LD_LIBRARY_PATH that points to the ARM Compute library files while running executable.

system('sshpass -p password ssh username@hostname "export LD_LIBRARY_PATH=targetDir/lib; cd targetDir;./inception_predict_arm.elf inputimage.txt out.txt"');

Transfer the Output Data from Target to MATLAB

Copy the generated output back to the current MATLAB session on the host computer.

system('sshpass -p password scp username@hostname:targetDir/out.txt ./');

Map Prediction Scores to Labels

Map the top five prediction scores to corresponding labels in the trained network.

outputImage = mapPredictionScores;

imshow(outputImage);

Cross Compile Deep Learning Code for ARM Neon Targets

Prerequisites

The `inception_predict_arm` Entry-Point Function

Set up a Deep Learning Configuration Object

First Approach: Create Static Library for Entry-Point Function on Host

Set Up a Code Generation Configuration Object

Configure the Cross-Compiler Toolchain

Generate Static Library on Host Computer by Using codegen

Copy the Generated Cross-Compiled Static Library to Target hardware

Copy the ARM Compute Library Files to Target Hardware

Copy Supporting Files to Target Hardware

Create the Executable on the Target

Run the Executable on the Target

Second Approach: Create Executable for Entry-Point function on Host

Set Up a Code Generation Configuration Object

Configure the Cross-Compiler Toolchain

Generate Executable on the Host Computer by Using `codegen`

Copy the Generated Executable to the Target Hardware

Copy the ARM Compute Library Files to the Target Hardware

Copy Supporting Files to the Target Hardware

Run the Executable on the Target Hardware

Transfer the Output Data from Target to MATLAB

Map Prediction Scores to Labels

See Also

Topics

Cross Compile Deep Learning Code for ARM Neon Targets

Prerequisites

The inception_predict_arm Entry-Point Function

Set up a Deep Learning Configuration Object

First Approach: Create Static Library for Entry-Point Function on Host

Set Up a Code Generation Configuration Object

Configure the Cross-Compiler Toolchain

Generate Static Library on Host Computer by Using codegen

Copy the Generated Cross-Compiled Static Library to Target hardware

Copy the ARM Compute Library Files to Target Hardware

Copy Supporting Files to Target Hardware

Create the Executable on the Target

Run the Executable on the Target

Second Approach: Create Executable for Entry-Point function on Host

Set Up a Code Generation Configuration Object

Configure the Cross-Compiler Toolchain

Generate Executable on the Host Computer by Using codegen

Copy the Generated Executable to the Target Hardware

Copy the ARM Compute Library Files to the Target Hardware

Copy Supporting Files to the Target Hardware

Run the Executable on the Target Hardware

Transfer the Output Data from Target to MATLAB

Map Prediction Scores to Labels

See Also

Topics

The `inception_predict_arm` Entry-Point Function

Generate Executable on the Host Computer by Using `codegen`