GPU arrayfun with shared arrays

5 views (last 30 days)
Ray
Ray on 11 Nov 2014
Edited: Matt J on 14 Nov 2014
Hi all,
I'm trying to speed-up some code I'm running by using the GPU functionality that comes with arrayfun.
I know arrayfun operates in an element-wise fashion however, I have a situation where I have some shared arrays involved in my function. For example, I have a function like:
f = f(a,b,A,B,C) Where a and b are (n x 1) arrays ie. the element-wise portion of the function. A, B, C are arrays that remain constant during each element-wise execution of a and b.
I've tried searching how to implement this but the results don't look too promising. Is it possible to do this using arrayfun? If not, is there another way I can speed-up such a function? I've tried utilising "par-for" but this actually turned out to be slower than a normal for-loop.
Thanks,
Ray

Answers (3)

Matt J
Matt J on 11 Nov 2014
Edited: Matt J on 11 Nov 2014
The only hope, I think, would be to write your own CUDA kernel implemention of f(), putting A,B,C in constant memory if they are small enough to fit there. You could manage this through MATLAB using a CUDAKernel object, see
and its setConstantMemory method.

Mikhail
Mikhail on 11 Nov 2014
You can try to use your function without arrayfun. If at least 1 of the arguments is on GPU, calculations will be performed on GPU.

Edric Ellis
Edric Ellis on 12 Nov 2014
Can you give a more concrete example of what you'd like to do with A, B, and C? You might be able to use a nested function with up-level variables. This example is quite complex, but it shows some of the more advanced things you can do with nested functions and arrayfun. In particular, the nested function updateParentGrid accesses the up-level variable grid and indexes into it to perform the stencil computation.
  1 Comment
Matt J
Matt J on 14 Nov 2014
Edited: Matt J on 14 Nov 2014
But can it be efficient to do this? I assume that there are CUDA threads doing each element-wise computation under the hood. If all threads need the variables A,B, and C, then surely those variables would need to be stored in constant memory in order for all threads to access them quickly enough.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!