Some of the most common reasons why GPU Coder™ generated code is not performing as expected are:
CUDA® kernels are not created.
Host to device and device to host memory transfers (
cudaMemcpy) are throttling performance.
Not enough parallelism or device issues.
These topics elaborate on the common causes for these symptoms and describe how to utilize the built-in screener to detect these issues. You can find information on how to work around for these issues and generate more efficient CUDA code.
|GPU Coder||Generate GPU code from MATLAB code|
|GPU Environment Check||Verify and set up GPU code generation environment|
|Generate C/C++ code from MATLAB code|
|Open GPU Coder app|
|Analyze and optimize performance of the generated code|
Programming for Code Generation
|Pragma that maps |
|Pragma that maps function to GPU kernels|
|Pragma to disable kernel creation for loops|
|Configuration parameters for CUDA code generation from MATLAB code by using GPU Coder|
|Configuration parameters for C/C++ code generation from MATLAB code|
|Configuration parameters for C/C++ code generation from MATLAB code with Embedded Coder|
|Create configuration object containing the parameters passed to
GPU Coder troubleshooting workflow.
- Code Generation Reports
Create and view reports generated during code generation.
- Trace Between Generated CUDA Code and MATLAB Source Code
Highlight sections of MATLAB code that runs on the GPU.
- Generating a GPU Code Metrics Report for Code Generated from MATLAB Code
Create and explore GPU static code metrics report.
- GPU Performance Analyzer
Visualize code metrics and identify optimization and tuning opportunities in your code.
- Debug CUDA MEX Functions
Suggestions for debugging CUDA MEX function.
- Kernel Analysis
Recommendations for generating efficient CUDA kernels.
- Memory Bottleneck Analysis
Reduce memory bottleneck issues when using GPU Coder.
- Analysis with NVIDIA Profiler
Improve performance by using the information obtained from NVIDIA Profiler (nvvp).
- Register Count nvlink Error
Troubleshoot compilation failures due to a register count