Processor-In-The-Loop Execution from Command Line

Use the processor-in-the-loop (PIL) execution to check the numerical behavior of the CUDA® code that you generate from MATLAB® functions. A PIL simulation, which requires target connectivity, compiles generated source code, and then downloads and runs object code on NVIDIA® GPU platforms. The results of the PIL simulation are transferred to MATLAB to verify the numerical equivalence of the simulation and the code generation results.

The PIL verification process is a crucial part of the design cycle to check that the behavior of the generated code matches the design. PIL verification requires an Embedded Coder® license.

Note

When using PIL execution, make sure that the Benchmarking option in GPU Coder™ settings is false. Executing PIL with benchmarking results in compilation errors.

Prerequisites

Target Board Requirements

  • NVIDIA DRIVE or Jetson embedded platform.

  • Ethernet crossover cable to connect the target board and host PC (if the target board cannot be connected to a local network).

  • NVIDIA CUDA toolkit installed on the board.

  • Environment variables on the target for the compilers and libraries. For information on the supported versions of the compilers and libraries and their setup, see Install and Setup Prerequisites for NVIDIA Boards.

Development Host Requirements

  • GPU Coder for code generation. For an overview and tutorials, see the Getting Started with GPU Coder (GPU Coder) page.

  • Embedded Coder.

  • NVIDIA CUDA toolkit on the host.

  • Environment variables on the host for the compilers and libraries. For information on the supported versions of the compilers and libraries, see Third-party Products (GPU Coder). For setting up the environment variables, see Environment Variables (GPU Coder).

Example: The Mandelbrot Set

Description

You do not have to be familiar with the algorithm in the example to complete the tutorial.

The Mandelbrot set is the region in the complex plane consisting of the values z0 for which the trajectories defined by

zk+1=zk2+z0,k=0,1,

remain bounded at k→∞. The overall geometry of the Mandelbrot set is shown in the figure. This view does not have the resolution to show the richly detailed structure of the fringe just outside the boundary of the set. At increasing magnifications, the Mandelbrot set exhibits an elaborate boundary that reveals progressively finer recursive detail.

Algorithm

Create a MATLAB script called mandelbrot_count.m with the following lines of code. This code is a baseline vectorized MATLAB implementation of the Mandelbrot set.

function count = mandelbrot_count(maxIterations, xGrid, yGrid) %#codegen
% mandelbrot computation

z0 = xGrid + 1i*yGrid;
count = ones(size(z0));

% Add Kernelfun pragma to trigger kernel creation
coder.gpu.kernelfun;

z = z0;
for n = 0:maxIterations
    z = z.*z + z0;
    inside = abs(z)<=2;
    count = count + inside;
end
count = log(count);

For this tutorial, pick a set of limits that specify a highly zoomed part of the Mandelbrot set in the valley between the main cardioid and the p/q bulb to its left. A 1000x1000 grid of real parts (x) and imaginary parts (y) is created between these two limits. The Mandelbrot algorithm is then iterated at each grid location. An iteration number of 500 is enough to render the image in full resolution. Create a MATLAB script called mandelbrot_test.m with the following lines of code. It also calls the mandelbrot_count function and plots the resulting Mandelbrot set.

maxIterations = 500;
gridSize = 1000;
xlim = [-0.748766713922161, -0.748766707771757];
ylim = [ 0.123640844894862,  0.123640851045266];

x = linspace( xlim(1), xlim(2), gridSize );
y = linspace( ylim(1), ylim(2), gridSize );
[xGrid,yGrid] = meshgrid( x, y );

count = mandelbrot_count(maxIterations, xGrid, yGrid);

figure(1)
imagesc( x, y, count );
colormap([jet();flipud( jet() );0 0 0]);
axis off
title('Mandelbrot set');

Create a Live Hardware Connection Object

To communicate with the NVIDIA hardware, you must create a live hardware connection object by using the jetson or drive function. To create a live hardware connection object, provide the host name or IP address, user name, and password of the target board. For example to create live object for Jetson hardware:

hwobj = jetson('192.168.1.15','ubuntu','ubuntu');

The software performs a check of the hardware, compiler tools and libraries, IO server installation, and gathers peripheral information on target. This information is displayed in the command window.

Checking for CUDA availability on the Target...
Checking for 'nvcc' in the target system path...
Checking for cuDNN library availability on the Target...
Checking for TensorRT library availability on the Target...
Checking for prerequisite libraries is complete.
Gathering hardware details...
Gathering hardware details is complete.
 Board name        : NVIDIA Jetson TX2
 CUDA Version      : 9.0
 cuDNN Version     : 7.0
 TensorRT Version  : 3.0
 Available Webcams : Microsoft® LifeCam Cinema(TM)
 Available GPUs    : NVIDIA Tegra X2

Alternatively, to create live object for DRIVE hardware:

hwobj = drive('92.168.1.16','nvidia','nvidia');

Note

If there is a connection failure, a diagnostics error message is reported on the MATLAB command window. If the connection has failed, the most likely cause is incorrect IP address or host name.

Configure the PIL Execution

Create a GPU code configuration object for generating a library and configure the object for PIL. Use the coder.hardware function to create a configuration object for the DRIVE or Jetson platform and assign it to the Hardware property of the code configuration object cfg. Use 'NVIDIA Jetson' for the Jetson boards and 'NVIDIA Drive' for the DRIVE boards.

cfg = coder.gpuConfig('lib','ecoder',true);
cfg.GpuConfig.CompilerFlags = '--fmad=false';
cfg.VerificationMode = 'PIL';
cfg.GenerateReport = true;
cfg.Hardware = coder.hardware('NVIDIA Jetson');

The --fmad=false flag when passed to nvcc, instructs the compiler to disable Floating-Point Multiply-Add (FMAD) optimization. This option is set to prevent numerical mismatch in the generated code because of architectural differences in the CPU and the GPU. For more information, see Numerical Differences Between CPU and GPU (GPU Coder).

Generate Code and Run PIL Execution

To generate CUDA library and the PIL interface, use the codegen command and pass the GPU code configuration object along with the size of the inputs for the mandelbrot_count entry-point function. The -test option runs the MATLAB test file, mandelbrot_test. The test file uses mandelbrot_count_pil, the generated PIL interface for mandelbrot_count.

codegen -config cfg -args {0,zeros(1000),zeros(1000)} mandelbrot_count -test mandelbrot_test
### Connectivity configuration for function 'mandelbrot_count': 'NVIDIA Jetson'
Code generation successful: View report
Running test file: 'mandelbrot_test' with MEX function 'mandelbrot_count_pil.mexa64'.
### Starting application: 'codegen/lib/mandelbrot_count/pil/mandelbrot_count.elf'
    To terminate execution: clear mandelbrot_count_pil
### Launching application mandelbrot_count.elf...

The software creates the following output folders:

  • codegen\lib\mandelbrot_count — Standalone code for mandelbrot_count.

  • codegen\lib\mandelbrot_count\pil — PIL interface code for mandelbrot_count.

Verify that the output of this run matches the output from the original mandelbrot_count.m function.

Note

On a Microsoft®Windows® system, the Windows Firewall can potentially block a PIL execution. Change the Windows Firewall settings to allow access.

Terminate the PIL Execution Process.

To terminate the PIL execution process.

clear mandelbrot_count_pil;

See Also

| | | | | | | |

Related Examples

More About