Why can this loop not be parallelized in Matlab Coder?

I am using MEX code to speed up a part of my computations and I would expect parallel computations to help at that. Automatic Parallelization is enabled, but even though the code is quite simple Coder refuses to parallelize it. In the coder report I see "Array or variable access pattern inside the loop is not suitable for parallel execution.". Another issue in the report asks me to enable "OptimizeReductions" to parallelize the line where the inverse is computed.
In the code below, you can see that, essentially, the function performes some pagewise operations on a large 3D matrix M. This seems like an obvious case for sliced variables to me.
I do not understand why this cannot be parallelized. What am I missing?
function [dets, Minv] = getDets(tri, xVrtx)
% Compute determinants of all simplices in the triangulation.
% Return the inverse of the characteristic matrix.
% tri ... nSmplx x nDim+1 matrix of vertex indices
% xVrtx ... nVrtx x nDim matrix of vertex coordinates
% dets ... nSmplx x 1 vector of determinants
% Minv ... nDim+1 x nDim+1 x nSmplx inverse of characteristic matrix
tri = int32(tri);
nDim = size(xVrtx, 2);
nSmplx = size(tri,1);
% characteristic matrices of all simplices
% nDim+1 x nDim+1 x nSmplx
M = [reshape(xVrtx(tri',:)', [nDim nDim+1 nSmplx]); ones([1 nDim+1 nSmplx])];
% allocate memory
dets = zeros(nSmplx,1);
Minv = zeros(nDim+1, nDim+1, nSmplx);
for j = 1:nSmplx
M_ = M(:,:,j);
dets(j) = det(M_);
if det(j) > 0
Minv(:,:,j) = inv(M_);
end
end
end % function

9 Comments

Experiment with
Zn = zeros(nDim+1, nDim+1);
for j = 1:nSmplx
M_ = M(:,:,j);
detj = det(M_);
if detj > 0
Minv_ = inv(M_);
else
Minv_ = Zn;
end
dets(j) = detj;
Minv(:,:,j) = Minv_;
end
Note that you could try this
M = rand(3,3,10)
pageinv(M)
% or pagemldivide(M, eye(3))
instead of mex the for loop. The pagexxx functions are internally multi thread coded, and it would be fast.
It looks like you want to compute barycentric coorinates of a mesh.
Hey Walter, hey Bruno,
thank you very much for your help!
I wasnt't aware of the pagemldivide function and it looks like a good solution for this specific problem. I will definitely test it. The code snippet that I am showing here is indeed for computing barycentric coordinates.
Though, there's also a bit more happening in my code so I would like to understand what the issue is with the parallelization in the code snippet above. I played around with your suggestion, Walter, and the branch seems not to be the issue.
I boiled it down to this
function [dets, Minv] = getDets(tri, xVrtx)
tri = int32(tri);
nDim = size(xVrtx, 2);
nSmplx = size(tri,1);
% characteristic matrices of all simplices
M = [reshape(xVrtx(tri',:)', [nDim nDim+1 nSmplx]); ones([1 nDim+1 nSmplx])];
% allocate memory
dets = zeros(nSmplx,1);
Minv = zeros(nDim+1, nDim+1, nSmplx);
for j = 1:size(Minv,3)
M_ = M(:,:,j);
dets(j) = det(M_);
Minv(:,:,j) = inv(M_);
end
end % function
So no branching what so ever. Only extracting one page, performing computations on it and storing the results. Storing the output of det and inv in intermediate variables befor writing it to the sliced variables did not change anyting. And here are is the corresponding section of the report:
Is there a way to explicitly flag a variable as "sliced"? Or is this not even the reason and it is a problem with the det and inv functions? I have the impression that the compiler does not know the expected size of the output of these two functions.
Just for an experiment can you try this for loop
Minv = zeros(nDim+1, nDim+1, nSmplx);
for j = 1:size(Minv,3)
Minv(:,:,j) = M(:,:,j) \ eye(nDim+1);
end
I ran the experiment that you suggested Bruno, and I still get the same issues in the report as before:
By now I have also installed the MSVC compiler in addition to MinGW and tested both C and C++ code generation. Same result...
I also just ran an experiment with automatic parallelization disabled and instead explicitly telling the compiler to parallelize the loop.
function [Minv] = multiInv(M)
n = size(M,1);
Minv = zeros(size(M));
coder.loop.parallelize("j");
for j = 1:size(M,3)
Minv(:,:,j) = M(:,:,j) \ eye(n);
end
end
Here, I get a new warnining "coder.loop function is ignored as the loop is not perfectly nested", which makes no sense to me.
Any more ideas what I could try or what the issue may be?
May be I'm wrong but it seems coder can only parallelize loop with simple arithmetic operations using omp clause. Calling function such as det, inv or mldivide is not supported at the pesence.
Note that your for loop can be transformed to parfor
Dear Bruno,
thank you for your help! Parfor sort of solves my problem so I'll work with that. With this I'll give up on parallelization in Coder for now. I don't want to bother you with this anymore :)
-----------------------------------
For completenss sake, here are the things that I tiried over the last couple of days, which all did not work:
Based on your comment I spent a while testing and finally tried this code - which I adapted from the MultipleQR function that you posted on the Matlab File Exchange and which only contains arithmetic operations.
function X = backsubs(R, Q)
% solve R * X = Q'
% where R is an upper triangular matrix
% used to compute matrix inverse based on its QR decomposition
coder.noImplicitExpansionInFunction
r = size(R,2);
n = size(Q,2);
nP = size(R,3);
X = zeros(r,n,nP);
% coder.loop.parallelize("k");
for k = 1:nP
Rk = R(:,:,k);
Qk = Q(:,:,k);
Xk = zeros(r,n);
Rii = diag(Rk);
for i=r:-1:1
Xk(i,:) = (Qk(:,i)' - Rk(i,i+1:end)*Xk(i+1:end,:)) ./ Rii(i);
end
X(:,:,k) = Xk;
end
end
With automatic parallelization it says "Array or variable access pattern inside the loop is not suitable for parallel execution.". With explicit "coder.loop.parallelize("k")" active it says "Coder loop function ignored because the loop is not perfectly nested.". I tried it on Windows with MinGW and MSVC and on Linux. Regardless of what I try, it just won't parallelize my code. Could it be that parallization doesn't work with 3D arrays? Could it be a bug? Or do I just not get how parallelization works?
Hi @Felix Birkelbach, it wont be possible to parallelize your code here since the nested for loop contains iterations that are dependent on other iterations, i.e. is dependent on . Restructuring your code to remove this dependency should fix your problems with parallelization.
Hey Divyam,
thanks for the input. You're right in hat the i loop cannot be parallelized. This is not the issue though. The k loop should be parallelizable, but it does not work. This is the part that I struggle with.

Sign in to comment.

Answers (0)

Categories

Products

Release

R2023b

Asked:

on 5 Nov 2024

Commented:

on 15 Dec 2024

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!