What functions need GPU support

Question

1 vote

Recently there are few posts about function that are not fully support gpuArray and could benefit from more intensive GPU support by TMW.

I open this thread so that users can submit their wishes and explain a typical user-case and why it is important for him/her to have this gpu feature.

I put here a list by category, but if one have a specific function that is not falling in any category, you are welcome to add it.

Basic array arithmetics and linear algebra seems pretty much well covered (?) but I do not know if something is missing in this huge library.

But these still need to be investigated.

Optimization functions, especially the gradient method where the gradient calculation can be performed in parallel on GPU
ODE functions where the Jacobian can be estimated in parallel
Interpolation functions where the preparation step and inquiring step can be both parallelized

8 Comments
Show 6 older comments Hide 6 older comments

Matt J on 29 Aug 2023

Edited: Matt J on 29 Aug 2023

Optimization functions, especially the gradient method where the gradient calculation can be performed in parallel on GPU

Sadly, TMW have heard this proposal and disagree that it would be beneficial. Transcript below

GPU array support for Optimization Toolbox

Inbox

Matt J

Jun 12, 2023, 10:58AM

Hi Mike,

In addition to gpuArray support for griddedInterpolant, I was also wondering about enabling gpuArray types for the Optimization Toolbox solvers. The Optimization Toolbox implements a number of iterative function minimization methods, where the function to be minimized is specified by a user-defined function handle. Currently, the toolbox solvers work only with CPU-double data and the user-supplied function is required to return its results in CPU-double form. This introduces a lot of bottlenecks. It would be good if these solvers could work entirely on the GPU. This shouldn't be too hard to enable, since all of the operations that the toolbox solvers perform on doubles are probably already implemented for gpuArrays as well.

Mike Croucher

Jun 14, 2023, 11:30AM

to me

Hi Matt

I can confirm that there is definitely gpuArray support for griddedInterpolant in R2023b. The pre-release will be available soon so you can try it out.

Regarding gpuArray support for optimisation solvers. There are currently no plans to do so.

One of our developers recently worked on a Proof of Concept for a customer solving a large set of nonlinear equations using GPU arrays and the speed up was marginal, at best.

Elsewhere, there is no evidence of GPU use by the usual competitors (Gurobi, CPLEX etc) and there seems to be similar conclusions in the open source world.

With that said, if you know of any evidence showing GPU acceleration of optimization algorithms that are relevant to your work, I’d be interested in knowing it.

With respect to your own optimization problems. Given that GPU acceleration seems to be off the table. What else might we try? Do you have something concrete I could look at?

Best Wishes,

Mike

Mike Croucher

Customer Success Engineer, MathWorks

My Blog: The MATLAB Blog - MATLAB & Simulink (mathworks.com)

Twitter: Mike Croucher (@walkingrandomly) / Twitter

Matt J on 29 Aug 2023

Edited: Matt J on 29 Aug 2023

Open in MATLAB Online

However if you can supply a use-case where gpu is desired it would be great for a concrete discussion.

The flavor of the issue is illustrated by the code below, which I also shared with TMW. It runs a few iterations of fminunc() to solve a basic set of equations A*x=b using both the CPU and the GPU. On the GTX 1080 Ti, I see nearly a 3x speed-up. However, as you can see in the user-provided objective() code, I am forced to use gather() to send the results back to the CPU every time the user-provided objective is invoked. As you make N smaller (e.g., N=500), this becomes a bottleneck and the GPU time is outperformed by the CPU by a factor of 3. If you turn the option SpecifyObjectiveGradient=false, it is outperformed by a factor of 10.

N=8e3;
opts=optimoptions('fminunc','Display','none','MaxIterations',4,...
    'SpecifyObjectiveGradient',true,'Algorithm','quasi-newton',...
    'HessUpdate','steepdesc');
%CPU
A=rand(N);
b=A*rand(N,1);
tic;
x=fminunc(@(x)objective(x,A,b) , ones(N,1) ,opts );  %
toc %Elapsed time is 3.915851 seconds.
disp ' '
%GPU
A=gpuArray(A);
b=gpuArray(b);
tic
x=fminunc(@(x)objective(x,A,b) , ones(N,1) ,opts );
toc %Elapsed time is 1.564494 seconds..
function [fval,grad]=objective(x,A,b)
    err=A*x-b;
    
    fval=norm(err).^2/2; 
    fval=gather(fval);
    
    if nargout>1
        grad=A.'*err; 
        grad=gather(grad);
    end
end

Matt J on 29 Aug 2023

And for N=500 with/without SpecifyObjectiveGradient?

Bruno Luong on 29 Aug 2023

I run several times and semect the best. CPU is much faster

CPU : Elapsed time is 0.109399 seconds.
GPU: Elapsed time is 2.010831 seconds.

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Bruno Luong on 29 Aug 2023

1 vote

Some user usecases of interpolation that is not possible in cpu:

https://www.mathworks.com/matlabcentral/answers/2009442-faster-three-dimensional-higher-order-interpolation?s_tid=srchtitle

https://www.mathworks.com/matlabcentral/answers/1968034-vectorize-a-series-of-interpn-calculations-with-gpu?s_tid=srchtitle

4 Comments
Show 2 older comments Hide 2 older comments

Matt J on 30 Aug 2023

Edited: Matt J on 30 Aug 2023

@Michal You should probably present your tests.

Michal on 30 Aug 2023

Edited: Michal on 30 Aug 2023

@Matt J I should delete my comment, because I just realize that all my timing results are not reliably reproducible. Measured timings strongly depends on installed version of NVIDIA driver (driver 535.x vs 525.x), at least on Ubuntu Linux 22.04. Please ignore my comment!

Sign in to comment.

Answer 2

Matt J on 26 Aug 2023

Edited: Matt J on 26 Aug 2023

0 votes

Sparse array indexing would be one example, e.g

>> A=gpuArray.speye(5)

A =

(1,1) 1

(2,2) 1

(3,3) 1

(4,4) 1

(5,5) 1

>> A(1,:)

Error using indexing

Sparse gpuArrays do not support indexing.

The lack of this might be the reason why some of what you've listed in the OP are not supported. A number of the Optimization Toolbox functions need to support indexing, I'm sure.

2 Comments
Show None Hide None

Bruno Luong on 26 Aug 2023

It looks like an important basic piece of library is missing.

Bruno Luong on 29 Aug 2023

Edited: Bruno Luong on 29 Aug 2023

One of the reason why sparse indexing code on gpu and cpu are not directly "transposable" because the storage on cpu is compressed sparse column (CSC) whereas gpu uses (CSR).

Sign in to comment.

Answer 3

Bruno Luong on 29 Aug 2023

0 votes

Matrix decomposition class seem not supported by GPU

1 Comment
Show -1 older comments Hide -1 older comments

Bruno Luong on 29 Aug 2023

Use case: Solving ODE with Crank–Nicolson scheme

https://www.mathworks.com/matlabcentral/answers/2011787-matrix-and-vector-multiplication-of-size-using-a-cpu-is-very-slow-using-gpu-is-much-quicker-but-i-n#comment_2864746

Sign in to comment.

What functions need GPU support

8 Comments
Show 6 older comments Hide 6 older comments

Answers (3)

4 Comments
Show 2 older comments Hide 2 older comments

2 Comments
Show None Hide None

1 Comment
Show -1 older comments Hide -1 older comments

Categories

Products

Tags

Community Treasure Hunt

What functions need GPU support

8 Comments Show 6 older comments Hide 6 older comments

Answers (3)

4 Comments Show 2 older comments Hide 2 older comments

2 Comments Show None Hide None

1 Comment Show -1 older comments Hide -1 older comments

Categories

Products

Tags

See Also

Community Treasure Hunt

8 Comments
Show 6 older comments Hide 6 older comments

4 Comments
Show 2 older comments Hide 2 older comments

2 Comments
Show None Hide None

1 Comment
Show -1 older comments Hide -1 older comments