I am using MEX code to speed up a part of my computations and I would expect parallel computations to help at that. Automatic Parallelization is enabled, but even though the code is quite simple Coder refuses to parallelize it. In the coder report I see "Array or variable access pattern inside the loop is not suitable for parallel execution.". Another issue in the report asks me to enable "OptimizeReductions" to parallelize the line where the inverse is computed. In the code below, you can see that, essentially, the function performes some pagewise operations on a large 3D matrix M. This seems like an obvious case for sliced variables to me. I do not understand why this cannot be parallelized. What am I missing? function [dets, Minv] = getDets(tri, xVrtx) % Compute determinants of all simplices in the triangulation. % Return the inverse of the characteristic matrix. % tri ... nSmplx x nDim+1 matrix of vertex indices % xVrtx ... nVrtx x nDim matrix of vertex coordinates % dets ... nSmplx x 1 vector of determinants % Minv ... nDim+1 x nDim+1 x nSmplx inverse of characteristic matrix tri = int32(tri); nDim = size(xVrtx, 2); nSmplx = size(tri,1); % characteristic matrices of all simplices % nDim+1 x nDim+1 x nSmplx M = [reshape(xVrtx(tri',:)', [nDim nDim+1 nSmplx]); ones([1 nDim+1 nSmplx])]; % allocate memory dets = zeros(nSmplx,1); Minv = zeros(nDim+1, nDim+1, nSmplx); for j = 1:nSmplx M_ = M(:,:,j); dets(j) = det(M_); if det(j) > 0 Minv(:,:,j) = inv(M_); end end end % function

Why can this loop not be parallelized in Matlab Coder?

Walter Roberson on 5 Nov 2024

Experiment with

Zn = zeros(nDim+1, nDim+1);
for j = 1:nSmplx
        M_ = M(:,:,j);
        detj = det(M_);
        if detj > 0
            Minv_ = inv(M_);
        else
            Minv_ = Zn;
        end
        dets(j) = detj;
        Minv(:,:,j) = Minv_;
    end

Bruno Luong on 6 Nov 2024

Edited: Bruno Luong on 6 Nov 2024

Open in MATLAB Online

Note that you could try this

M = rand(3,3,10)
pageinv(M)
% or pagemldivide(M, eye(3))

instead of mex the for loop. The pagexxx functions are internally multi thread coded, and it would be fast.

It looks like you want to compute barycentric coorinates of a mesh.

Felix on 6 Nov 2024

Open in MATLAB Online

Hey Walter, hey Bruno,

thank you very much for your help!

I wasnt't aware of the pagemldivide function and it looks like a good solution for this specific problem. I will definitely test it. The code snippet that I am showing here is indeed for computing barycentric coordinates.

Though, there's also a bit more happening in my code so I would like to understand what the issue is with the parallelization in the code snippet above. I played around with your suggestion, Walter, and the branch seems not to be the issue.

I boiled it down to this

function [dets, Minv] = getDets(tri, xVrtx)
    tri = int32(tri);
    nDim = size(xVrtx, 2);
    nSmplx = size(tri,1);
    % characteristic matrices of all simplices
    M = [reshape(xVrtx(tri',:)', [nDim nDim+1 nSmplx]); ones([1 nDim+1 nSmplx])];
    % allocate memory
    dets = zeros(nSmplx,1);
    Minv = zeros(nDim+1, nDim+1, nSmplx);
    for j = 1:size(Minv,3)
        M_ = M(:,:,j);
        dets(j) = det(M_);
        Minv(:,:,j) = inv(M_);
    end
end % function

So no branching what so ever. Only extracting one page, performing computations on it and storing the results. Storing the output of det and inv in intermediate variables befor writing it to the sliced variables did not change anyting. And here are is the corresponding section of the report:

Is there a way to explicitly flag a variable as "sliced"? Or is this not even the reason and it is a problem with the det and inv functions? I have the impression that the compiler does not know the expected size of the output of these two functions.

Bruno Luong on 8 Nov 2024

Edited: Bruno Luong on 8 Nov 2024

Open in MATLAB Online

Just for an experiment can you try this for loop

Minv = zeros(nDim+1, nDim+1, nSmplx);
for j = 1:size(Minv,3)
    Minv(:,:,j) = M(:,:,j) \ eye(nDim+1);
end

Felix on 8 Nov 2024

Open in MATLAB Online

I ran the experiment that you suggested Bruno, and I still get the same issues in the report as before:

By now I have also installed the MSVC compiler in addition to MinGW and tested both C and C++ code generation. Same result...

I also just ran an experiment with automatic parallelization disabled and instead explicitly telling the compiler to parallelize the loop.

function [Minv] = multiInv(M)
    n = size(M,1);
    Minv = zeros(size(M));
    
    coder.loop.parallelize("j");
    for j = 1:size(M,3)
        Minv(:,:,j) = M(:,:,j) \ eye(n);
    end
end

Here, I get a new warnining "coder.loop function is ignored as the loop is not perfectly nested", which makes no sense to me.

Any more ideas what I could try or what the issue may be?

Bruno Luong on 9 Nov 2024

Edited: Bruno Luong on 9 Nov 2024

May be I'm wrong but it seems coder can only parallelize loop with simple arithmetic operations using omp clause. Calling function such as det, inv or mldivide is not supported at the pesence.

Note that your for loop can be transformed to parfor

Felix on 15 Nov 2024

Open in MATLAB Online

Dear Bruno,

thank you for your help! Parfor sort of solves my problem so I'll work with that. With this I'll give up on parallelization in Coder for now. I don't want to bother you with this anymore :)

-----------------------------------

For completenss sake, here are the things that I tiried over the last couple of days, which all did not work:

Based on your comment I spent a while testing and finally tried this code - which I adapted from the MultipleQR function that you posted on the Matlab File Exchange and which only contains arithmetic operations.

function X = backsubs(R, Q)
    % solve R * X = Q' 
    % where R is an upper triangular matrix
    % used to compute matrix inverse based on its QR decomposition
    coder.noImplicitExpansionInFunction
    r = size(R,2);
    n = size(Q,2);
    nP = size(R,3);
    X = zeros(r,n,nP);
    % coder.loop.parallelize("k");
    for k = 1:nP
      Rk = R(:,:,k);
      Qk = Q(:,:,k);
      Xk = zeros(r,n);
      Rii = diag(Rk);
      for i=r:-1:1
        Xk(i,:) = (Qk(:,i)' - Rk(i,i+1:end)*Xk(i+1:end,:)) ./ Rii(i);
      end
      X(:,:,k) = Xk;
    end
end

With automatic parallelization it says "Array or variable access pattern inside the loop is not suitable for parallel execution.". With explicit "coder.loop.parallelize("k")" active it says "Coder loop function ignored because the loop is not perfectly nested.". I tried it on Windows with MinGW and MSVC and on Linux. Regardless of what I try, it just won't parallelize my code. Could it be that parallization doesn't work with 3D arrays? Could it be a bug? Or do I just not get how parallelization works?

Divyam on 9 Dec 2024

Hi @Felix Birkelbach, it wont be possible to parallelize your code here since the nested for loop contains iterations that are dependent on other iterations, i.e.

is dependent on

. Restructuring your code to remove this dependency should fix your problems with parallelization.

Felix on 15 Dec 2024

Hey Divyam,

thanks for the input. You're right in hat the i loop cannot be parallelized. This is not the issue though. The k loop should be parallelizable, but it does not work. This is the part that I struggle with.

Why can this loop not be parallelized in Matlab Coder?

9 Comments
Show 7 older comments Hide 7 older comments

Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

Why can this loop not be parallelized in Matlab Coder?

9 Comments Show 7 older comments Hide 7 older comments

Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

9 Comments
Show 7 older comments Hide 7 older comments