Parfor slower that for

I'm trying to improve the execution time in my code but parfor is getting longer than normal for. My code is the following:
dangle = 1800;
av = zeros(size(0:pi/dangle:pi));
parfor i=1:dangle+1
theta1=pi/dangle*i - pi/dangle;
for j=1:dangle+1
theta2=pi/dangle*j - pi/dangle;
T12 = (yy.*(sin(theta1)*cos(theta2))) + (xy.*(cos(theta1)*cos(theta2)))...
- (yx.*(sin(theta1)*sin(theta2))) - (xx.*(cos(theta1)*sin(theta2)));
T21 = (yy.*(cos(theta1)*sin(theta2))) - (xy.*(sin(theta1)*sin(theta2)))...
+ (yx.*(cos(theta1)*cos(theta2))) - (xx.*(sin(theta1)*cos(theta2)));
av(i,j) = mean(mean(abs(nonzeros(T12))))+ mean(mean(abs(nonzeros(T21))));
end
end
I have also tried starting and ending a parpool before and after the parfor but it seems not to be working
Do you have a clue about what is happening?. Thanks

Answers (1)

Walter Roberson
Walter Roberson on 28 May 2018

0 votes

parfor uses independent processes. The instructions about what to do have to be sent to the processes, and the data has to be sent to the processes, and the results have to be communicated back from the processes. These are overhead costs, and if you are not asking each process to do "enough" work on the transferred data to make them negligible, then it is fairly common for parfor to be significantly slower than regular for.
Also, by default each worker has access to only one core and so cannot take advantage of automatic multithreading that MATLAB can do on the user's behalf if it detects some data patterns. For all of the workers are being asked to do the same size tasks, this generally results in the same total time being taken over all of the workers, but the time for individual tasks goes up (but several of them are being done at the same time.)
When I look at your code, it seems to me that you should be getting rid of all of your looping and should be vectorizing your calculations.
I know that at the moment the vectorization is not obvious because each nonzeros() result will be a different size, but there is a way around that, mathematically.
Your code mean(mean(abs(nonzeros(T12)) for 2D array T12 is the same as sum(abs(T12(:)))./nnz(T12) .
We do not know what size your variables xx, xy, yx, yy are, but in context the code would make the most sense if they were 2D arrays.
I will have a look for good ways to write the code in vectorized form. I have to go out soon, so the response might be later.

3 Comments

Here is a version that is about 5 times faster than the serial version of your code.
The below code relies upon a MATLAB feature that was introduced in R2016b. For earlier MATLAB versions, a
if ~exist('xx','var')
xx = rand(5,7)*10-5;
xy = rand(5,7)*10-5;
yy = rand(5,7)*10-5;
yx = rand(5,7)*10-5;
end
tic;
dangle = 1800;
theta2 = reshape(linspace(0, pi, dangle+1), 1, 1, []);
av2 = zeros(size(0:pi/dangle:pi));
for i=1:dangle+1
theta1=pi/dangle*i - pi/dangle;
T12 = (yy.*(sin(theta1).*cos(theta2))) + (xy.*(cos(theta1).*cos(theta2)))...
- (yx.*(sin(theta1).*sin(theta2))) - (xx.*(cos(theta1).*sin(theta2)));
T21 = (yy.*(cos(theta1).*sin(theta2))) - (xy.*(sin(theta1).*sin(theta2)))...
+ (yx.*(cos(theta1).*cos(theta2))) - (xx.*(sin(theta1).*cos(theta2)));
part12 = sum(sum(abs(T12),1),2)./sum(sum(T12~=0,1),2);
part22 = sum(sum(abs(T21),1),2)./sum(sum(T21~=0,1),2);
av2(i,:) = part12 + part22;
end
toc
surf(av2,'edgecolor','none');
Hello Walter,
Thank you so much for the answer. It is a very clever way to improve the code and also it is very easy to understand. Unfortunately, I continue getting an issue related with array dimensions. Probably it is because I'm working on an R2016a Matlab version.
Your description about multithreading in Matlab is also very clear. I didn't know that Matlab uses multithreading when is possible. That differs from my previous experience in multithreading by using OpenMP ... now it is not very clear what is the advantage of using the parfor command when working on a single computer and not in a cluster.
if ~exist('xx','var')
xx = rand(5,7)*10-5;
xy = rand(5,7)*10-5;
yy = rand(5,7)*10-5;
yx = rand(5,7)*10-5;
end
tic;
dangle = 1800;
theta2 = reshape(linspace(0, pi, dangle+1), 1, 1, []);
av2 = zeros(size(0:pi/dangle:pi));
for i=1:dangle+1
theta1=pi/dangle*i - pi/dangle;
T12 = bsxfun(@times, yy, (sin(theta1).*cos(theta2))) + bsxfun(@times, xy, (cos(theta1).*cos(theta2)))...
- bsxfun(@times, yx, (sin(theta1).*sin(theta2))) - bsxfun(@times, xx, (cos(theta1).*sin(theta2)));
T21 = bsxfun(@times, yy, (cos(theta1).*sin(theta2))) - bsxfun(@times, xy, (sin(theta1).*sin(theta2)))...
+ bsxfun(@times, yx, (cos(theta1).*cos(theta2))) - bsxfun(@times, xx, (sin(theta1).*cos(theta2)));
part12 = sum(sum(abs(T12),1),2)./sum(sum(T12~=0,1),2);
part22 = sum(sum(abs(T21),1),2)./sum(sum(T21~=0,1),2);
av2(i,:) = part12 + part22;
end
toc
surf(av2,'edgecolor','none');

Sign in to comment.

Categories

Asked:

on 28 May 2018

Commented:

on 29 May 2018

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!