Why is MATLAB gpuArray sparse matrix multiplication so fast despite using double precision?

32 views (last 30 days)
Di Xiao
Di Xiao on 29 Apr 2020
Commented: Di Xiao on 24 Jun 2021
I am working with multiplication of a large sparse matrix with a dense matrix using gpuArray. On my GTX 1080, MATLAB's sparse matrix multiplication runs in 5.04ms (multiplication only timed with tic/toc)
tic
gpu_mmm = gpu_matrix * gpu_input;
mvm_time = toc;
. I also have a CUDA 10.2 implementation of sparse matrix multiplication using cuSPARSE, which runs the same sparse matrix multiplication in 7.25ms (timed with the Nvidia profiler). However, my CUDA implementation uses float32, while the MATLAB implementation only supports sparse matrices of type double. To my knowledge, GPUs are much faster with single precision calculations compared to double precision calculations, so I am wondering why MATLAB performs this calculation faster despite the difference in precision.
  2 Comments
Di Xiao
Di Xiao on 24 Jun 2021
Sorry for the late response! I redid the timing experiment with a 2080Ti and using cuSPARSE was faster for me by around 2x.

Sign in to comment.

Answers (2)

Andrea Picciau
Andrea Picciau on 30 Apr 2020
Hello there!
The correct way to time GPU operations is by using gputimeit.
mvm_time = gputimeit(@() gpu_matrix*gpu_input, 1);
or, in alternative
gpu = gpuDevice();
tic
gpu_mmm = gpu_matrix * gpu_input;
wait(gpu);
mvm_time = toc;
I suggest you try measuring your code like this...
  2 Comments
Andrea Picciau
Andrea Picciau on 1 May 2020
GPU operations are executed asynchronously, which means most of the time control is returned to the user right after the operations are launched. Wait makes sure you're measuring the whole duration of the computation, and gputimeit does something similar under the hood.

Sign in to comment.


Edric Ellis
Edric Ellis on 30 Apr 2020
You should use gputime it to time operations on the GPU (although I'm not certain it will actually make a difference in this case). Behind the scenes, gpuArray is simply using the cuSPARSE routines in double-precision, so it should show basically the same performance...
  2 Comments
Joss Knight
Joss Knight on 2 May 2020
If you are doing two separate multiplies rather than promoting the sparse array to complex and using the cusparseCgemm routine, then that is almost certainly where the difference comes from. MATLAB is also very efficient about memory allocation so the remaining discrepancies could be to do with the way you are managing memory.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!