Kernels from Library Calls
GPU Coder™ supports libraries optimized for CUDA® GPUs such as cuBLAS, cuSOLVER, cuFFT, Thrust, cuDNN, and TensorRT libraries.
The cuBLAS library is an implementation of Basic Linear algebra Subprograms (BLAS) on top of the NVIDIA® CUDA run time. It allows you to access the computational resources of the NVIDIA GPU.
The cuSOLVER library is a high-level package based on the cuBLAS and cuSPARSE libraries. It provides useful LAPACK-like features, such as common matrix factorization and triangular solve routines for dense matrices, a sparse least-squares solver, and an Eigenvalue solver.
The cuFFT library provides a high-performance implementation of the Fast Fourier Transform (FFT) algorithm on NVIDIA GPUs. The cuBLAS, cuSOLVER, and cuFFT libraries are part of the NVIDIA CUDA Toolkit.
Thrust is a C++ template library for CUDA. The Thrust library is shipped with CUDA Toolkit and allows you to take advantage of GPU-accelerated primitives such as sort to implement complex high-performance parallel applications.
The NVIDIA CUDA Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural networks. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. The NVIDIA TensorRT is a high performance deep learning inference optimizer and runtime library. For more information, see Code Generation for Deep Learning Networks by Using cuDNN and Code Generation for Deep Learning Networks by Using TensorRT.
GPU Coder does not require a special pragma to generate kernel calls to libraries. During
the code generation process, when you select the Enable cuBLAS option in
the GPU Coder app or use
config_object.GpuConfig.EnableCUBLAS = true
property in CLI, GPU Coder replaces some functionality with calls to the cuBLAS library. When
you select the Enable cuSOLVER option in the GPU Coder app or use
config_object.GpuConfig.EnableCUSOLVER = true
property in CLI, GPU Coder replaces some functionality with calls to the cuSOLVER library. For GPU Coder to replace high-level math functions to library calls, the following conditions
must be met:
GPU-specific library replacement must exist for these functions.
MATLAB® Coder™ data size thresholds must be satisfied.
GPU Coder supports cuFFT, cuSOLVER, and cuBLAS library replacements for the functions listed in the table. For functions that do not have replacements in CUDA, GPU Coder uses portable MATLAB functions that are mapped to the GPU.
|MATLAB Function||Description||MATLAB Coder LAPACK Support||cuBLAS, cuSOLVER, cuFFT, Thrust Support|
Solve system of linear equation
LU matrix factorization
Reciprocal condition number
Solve system of linear equations
Eigenvalues and eigen vectors
Singular value decomposition
Fast Fourier Transform
Inverse Fast Fourier Transform
Sort array elements
When you select the Enable cuFFT option in the GPU Coder app or use
config_object.GpuConfig.EnableCUFFT = true
property in CLI, GPU Coder maps
fft,ifft,fft2,ifft2,fftn.ifftn function calls in your
MATLAB code to the corresponding cuFFT library calls. For 2-D transforms and higher,
GPU Coder creates multiple 1-D batched transforms. These batched transforms have higher
performance than single transforms. GPU Coder only supports out-of-place transforms. If Enable cuFFT is
not selected, GPU Coder uses C
FFTW libraries where available or generates kernels
from portable MATLAB FFT. Both single and double precision data types are supported. Input and output
can be real or complex-valued, but real-valued transforms are faster. cuFFT library support
input sizes that are typically specified as a power of 2 or a value that can be factored into
a product of small prime numbers. In general the smaller the prime factor, the better the
Using CUDA library names such as
cudnn as the names of your MATLAB function results in code generation errors.