execution time with or without parfor
Show older comments
I have a simple code for testing parfor in my local profile (with 4 cores)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%code 1
matlabpool open 4 % 2 or 1
tic;
parfor i = 1:30
res = 0;
for n = 1 : 3000000
res = res + sin(n) + cos(n);
end
A(i) = res;
end
toc;
matlabpool close
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%code 2
tic;
for i = 1:30
res = 0;
for n = 1 : 3000000
res = res + sin(n) + cos(n);
end
A(i) = res;
end
toc;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
I have executed code 1 using 4 labs or 2 labs or 1 lab and executed code 2. the results is here:
code-1 - 8 labs(4 core with 4 hypthread) --> 15 sec
code-1 - 4 labs --> 22 sec
code-1 - 2 labs --> 35 sec
code-1 - 1 labs --> 65 sec
code-2 - --> 18 sec
regards the results, it is better to use code-2 and releasing all other cores (you may also consider the time needed to run 'matlabpool open' and 'matlabpool close'). I have read this : http://www.mathworks.co.uk/matlabcentral/answers/44734-there-is-aproblem-in-parfor
but it seems in this case execution time is much longer than setup time of parallel mechanism.
if there is not any thing wrong with my results, main question is when its better to use parfor.
17 Comments
Matt J
on 3 Feb 2014
I can't reproduce that, I'm afraid. I see close to linear speed-up with 2,4, and 12 workers in the pool. What version of MATLAB are you using and what CPU(s)?
Edric Ellis
on 4 Feb 2014
NUMLABS is designed to return 1 inside PARFOR because you cannot use labSend/labReceive there. This is described in the documentation.
amir
on 4 Feb 2014
Matt J
on 4 Feb 2014
NUMLABS will only return a meaningful value inside an SPMD...END block.
Matt J
on 4 Feb 2014
@mohammad
Are there any other machines available to you that you could test it on, to check whether the problem is platform-dependent?
amir
on 4 Feb 2014
amir
on 5 Feb 2014
As I mentioned here, I ran the first version of the code and successfully achieved near linear speed-up with PARFOR. That was with R2013b. I haven't run the second version of the code yet, but I don't see any significant modification in it that would lead me to expect a different result.
So, the slow behavior you're seeing has to be environment-related.
Here are my results when I run the modified version of the test code for poolsize=0:12. The three columns correspond to R2011b, R2012b, and R2013b
Times =
19.9430 20.4689 21.0302
21.1632 21.8318 23.0208
10.6021 10.7968 11.5326
7.0738 7.3209 7.9293
5.7969 5.9354 6.1944
4.3994 4.5522 4.9174
3.7105 3.8611 4.1811
3.6653 3.7533 3.9924
3.0179 3.1299 3.2726
2.9612 3.0899 3.2563
2.3155 2.3643 2.5791
2.3111 2.3792 2.5677
2.3000 2.3633 2.6129
Interestingly, performance gets a bit slower with more recent releases. Not sure if that's a significant trend, though. This is on an Intel Xeon X5680 @3.33 Ghz, dual hexacore CPU.
So... still baffled.
Matt J
on 6 Feb 2014
Any difference if you pre-allocate A first?
Matt J
on 6 Feb 2014
I wish one system like that.do you fly with it ?
Not always. Like you, I've also had cases where PARFOR mysteriously under-performs in environment-dependent ways. See this thread, for instance
Matt J
on 6 Feb 2014
You're not doing any of this over a network are you? This is all on a local CPU?
Accepted Answer
More Answers (0)
Categories
Find more on Parallel for-Loops (parfor) in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!