GPU problem CUDA_ERROR_UNKNOWN

25 views (last 30 days)
Peter
Peter on 4 Jan 2017
Commented: Xubin Lin on 13 Jun 2020
I'm running a matlab simulation code using an iterative matrix equation solver. This solver is called on the GPU every few time steps in a time stepping loop. This goes well for some dozens of time steps (although the computations gradually slow down...) until the screen goes black for a short instant of time and the simulation crashes with the following error message:
Error using gpuArray/subsasgn
An unexpected error occurred during CUDA execution. The CUDA error was:
CUDA_ERROR_UNKNOWN
After this, Matlab does not recognize the GPU device anymore: the command
gpuDevice
results in:
Error using gpuDevice (line 26)
An unexpected error occurred trying to retrieve CUDA device properties. The CUDA error was:
CUDA_ERROR_UNKNOWN
Restarting matlab is not sufficient to restore the GPU. Restarting the PC is.
I'm running matlab 2016b on windows 10, using an Nvidia TITAN X (Pascal) GPU with the newest driver installed.
Do the above symptoms inspire anyone for a diagnosis of this problem?
  2 Comments
Xubin Lin
Xubin Lin on 13 Jun 2020
Dear Joss,
I also have the same problem.
An error occurred during PTX compilation of <image>.
The information log was:
The error log was:
The CUDA error code was: CUDA_ERROR_ILLEGAL_ADDRESS.
My output of gpuDevice is as follows(matlabR2019a and CUDA 10.2):
Name: 'GeForce GTX 1060'
Index: 1
ComputeCapability: '6.1'
SupportsDouble: 1
DriverVersion: 11
ToolkitVersion: 10
MaxThreadsPerBlock: 1024
MaxShmemPerBlock: 49152
MaxThreadBlockSize: [1024 1024 64]
MaxGridSize: [2.1475e+09 65535 65535]
SIMDWidth: 32
TotalMemory: 6.4425e+09
MultiprocessorCount: 10
ClockRateKHz: 1670500
ComputeMode: 'Default'
GPUOverlapsTransfers: 1
KernelExecutionTimeout: 1
CanMapHostMemory: 1
DeviceSupported: 1
DeviceSelected: 1

Sign in to comment.

Accepted Answer

Peter
Peter on 6 Jan 2017
Monitoring the GPU performance revealed that most probably the temperature is causing the issue: Slowing down of performance goes with rising of temperature and performance is capped by temperature.
Crash of the GPU occurred when GPU reached 95 degrees...

More Answers (2)

Matt J
Matt J on 4 Jan 2017
I've had symptoms like that before. Re-installing/updating the GPU driver fixed it for me, but it was never clear to me what the root cause was.
  1 Comment
Peter
Peter on 5 Jan 2017
thanks Matt, I did install the latest drivers (several times now) hoping for it to solve the issue but unfortunately without success.

Sign in to comment.


shirin
shirin on 4 Aug 2017
I have the same problem, nothing worked until now. Any solution found??
  2 Comments
Peter
Peter on 23 Apr 2018
I solved it by: 1) a smarter placement of the GPU in the pc casing, allowing for better air-flow 2) change the behavior of the cooling fan: generally it only reacts to CPU activity. can be set in BIOS I believe. just made it blow a little harder. This is all very machine specific so it will take some investigating on your part to try these options.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!