Parfor getting slower than a normal for loop

Hi all,
I am trying to parallelize the execution of a funtion in matlab.
Specifically, I am performing the following code:
clear
%reduce the size of A if necessary.
A=rand(1000,60);
B=int8(rand(1000,60)>0.5);
lambda_tol=0.000000000000004;
N=10;
G=15;
lambda_tol_vector= zeros(G,1);
conto = 1;
for h=-G:0.1:G
lambda_tol_vector(conto)=2^(h);
conto = conto+1;
end
M=4;
tol = 1e-9;
tic
CompletedMat_nopar={};
%using normal for loop:
for k = 1:size(lambda_tol_vector,1)
fprintf('Completion using nuclear norm regularization... \n');
[CompletedMat,objective,flag] = matrix_completion_nuclear_GG_alt(A.*double(B),double(B),N,lambda_tol_vector(k),tol);
if flag==1
CompletedMat_nopar{k}=zeros(size(A));
end
CompletedMat_nopar{k}=CompletedMat;
end
toc
%1000x60 --> Elapsed time is 55.271974 seconds
%using a parfor loop:
tic
CompletedMat_par={};
parfor (k = 1:size(lambda_tol_vector,1),M)
fprintf('Completion using nuclear norm regularization... \n');
[CompletedMatpar,objective_par,flag_par] = matrix_completion_nuclear_GG_alt(A.*double(B),double(B),N,lambda_tol_vector(k),tol);
if flag_par==1
CompletedMat_par{k}=zeros(size(A));
end
CompletedMat_par{k}=CompletedMatpar;
end
toc
%1000x60 --> Elapsed time is 95.671825 seconds
You can see the function matrix_completion_nuclear_GG_alt attached.
I aam not able to understand the reason why the parffor runs that slower than the normal for loop. Does it depend on the matrix_completion_nuclear_GG_alt function or on the syntax of the parfor?
EDIT: if it might be of any help here is my ticByttes and tocBytes:
BytesSentToWorkers BytesReceivedFromWorkers
__________________ ________________________
1 84744 2.4034e+07
2 87328 2.6919e+07
3 84712 2.2112e+07
4 84752 2.4516e+07
5 84744 2.4034e+07
6 82192 2.3072e+07
Total 5.0847e+05 1.4469e+08

7 Comments

Matt J
Matt J on 19 Nov 2022
Edited: Matt J on 19 Nov 2022
Probably the former. Has the parpool been created already? Your code doesn't show where.
@Matt J no I did not. I am totally new to the parfor toolbox so am not proficient on the workflow.
The function is actually quite easy but apparently takes a lot to run. Cannot find the bottle neck however.
Create the parpool first. Then perform the timing tests.
Generally speaking, even without taking into account the time required to create a parpool, it is common for parfor to be slower than not using parfor:
  • the communications overhead to send data to the worker and to collect the results can add up to a lot, especially if you have large broadcast variables
  • by default, workers only get one core. If you are doing mathematics that could benefit from multiple cores, the resulting slowdown can be noticable. For example, in normal MATLAB, taking the sum of a large array can be done in parallel over N workers by dividing the data into N pieces and having one core summing each piece, taking about 1/N times as long as if you were to run the sum in series. (In practice there would be efficiency considerations; you would want to split the data at memory page boundaries so that the data segments could be flipped directly into the memory of the worker without having to do any copying.)
For example, in normal MATLAB, taking the sum of a large array can be done in parallel over N workers by dividing the data into N pieces and having one core summing each piece, taking about 1/N times as long as if you were to run the sum in series.
@Walter Roberson I see. However, in my case daaa cannot be split since the algorithm I aam performing needs the information about the entire matrix. What I can do is to split the "for k=..." loop. In this case, is there an efficient way to perform such a split? I meaan, if I understood correctly I should:
  1. Split the "for k=..." loop into N mini-loops;
  2. Tell each core to perform a for mini-loop
  3. Wrap up the results of step 2 for each core
Being new in parallelization in MATLAB,, however, I haven't clear how to perform steps 2 and 3. Could you please provide a MWE or a link where this is implemented?
Thank you in advance!
Stephen23
Stephen23 on 21 Nov 2022
Edited: Stephen23 on 21 Nov 2022
"However, in my case daaa cannot be split since the algorithm I aam performing needs the information about the entire matrix. What I can do is to split the "for k=..." loop. In this case, is there an efficient way to perform such a split? I meaan, if I understood correctly I should:"
No, that is the complete opposite of what Walter Roberson was telling you. The point is that many MATLAB operations, e.g. SUM, are already multi-threaded. This is automatic and occurs without the user needing to do anything special at all.
This is one reason for the advice that Ayush gave you to vectorize, given in your other thread.
Your attempts to speed things up are most likely going to fail because most of what you are attempting is fighting the actual ways that MATLAB code can be written to be fast and to benefit from the inherent speed and multi-threading of basic MATLAB operators. Learning how to write efficient MATLAB code is the best start to this task. Only once you have shown that reasonable best-practice really is a bottleneck should you start to investigate alternatives.
@Stephen23 thank you I will dig more on efficient MATLAB coding.

Sign in to comment.

Answers (0)

Categories

Find more on Loops and Conditional Statements in Help Center and File Exchange

Products

Release

R2021b

Asked:

on 19 Nov 2022

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!