PARFOR is 10X Slower than FOR
57 views (last 30 days)
Show older comments
Paul Safier
on 16 Jun 2022
Commented: Paul Safier
on 21 Jun 2022
I'm trying to understand why my use of the parfor is so much slower than using a for loop.
Both parfor and for implementations require some IO, namely the reading of a png file that is 20 Kb (buildingDesign.png attached).
The code called (getSpacing_HELP.m) takes this png file, converts it into a BW matrix and does several image processing operations on the matrix. The purpose is to calculate the average spacing between objects in an image.
The code shown here is a pared-down, simplified version that only includes the meat of the code. I'm using 36 workers on a Linux machine. Images below show the histogram of times for 1000 calls to getSpacing_HELP.m.
Why would the parfor implementation be so much slower? Any suggestions on ways to speed it up? In actual use, the spacing routine could be called a ~10^8 times, so this time difference is important.
ntot = 1000; % How many runs to test
% Create a directory to place copies of the file in. This is just to
% simulate the actual use of this code for troubleshooting.
mkdir tmp1
for jj = 1:ntot
theName = ['./tmp1/',num2str(jj),'_clipped.png'];
copyfile('./Files/buildingDesign.png',theName);
end
%
resultsMat = zeros([ntot 1]);
timeMat = zeros([ntot 1]);
tic
%parfor k = 1:ntot
for k = 1:ntot
fileName = ['./tmp1/',num2str(k),'_clipped.png'];
[singleResult,theTime] = getSpace_HELP(fileName);
resultsMat(k) = singleResult;
timeMat(k) = theTime;
end
histogram(timeMat), xlabel('Time (s)')
timeAll = toc;
disp(['Time to do all runs: ',num2str(timeAll)])
6 Comments
Edric Ellis
on 20 Jun 2022
Aha. I suspect that bwlookup can take advantage of MATLAB's intrinsic multi-threading. If that is the case, then your multithreaded desktop MATLAB process is already taking full advantage of all the cores on your system. The workers in a parallel pool run in single-threaded mode (by default). You can confirm this by either monitoring the processor utilisation of desktop MATLAB using top (or similar); or, you can force your desktop MATLAB into single-threaded mode for comparison purposes by using maxNumCompThreads(1).
Basically, any time your original for-loop code is dominated by stuff that is already multithreaded by MATLAB itself, there is no advantage to using a local parallel pool. You're already fully utilising your machine. In cases like this, you may see benefit from using parfor with multiple remote workers.
Accepted Answer
Raymond Norris
on 16 Jun 2022
@Paul Safier I believe the problem is that you're calling nested tic/toc. In clipSpacing_HELP, you call tic on line 16 and toc on line 29. However, in the for-loop you call getSpace_HELP, which calls tic on line 6. This becomes the new start time for the call to toc on line 29 in clipSpacing_HELP.
Conversely, when you call getSpace_HELP in a parfor, since the call to tic happens in another worker, the call to toc in clipSpacing_HELP isn't aware of it, so it still uses the tic on line 16 (which is what you really want the for-loop to do as well).
The solution is to link the tic/toc together with a variable, as such (I'm using t0).
t0 = tic;
%parfor k = 1:ntot
for k = 1:ntot
fileName = ['./tmp1/',num2str(k),'_clipped.png'];
[singleResult,theTime] = getSpace_HELP(fileName);
resultsMat(k) = singleResult;
timeMat(k) = theTime;
end
histogram(timeMat), xlabel('Time (s)')
timeAll = toc(t0);
This way, the call to tic/toc in getSpace_HELP doesn't reset the toc being assigned to timeAll. Make this change and rerun it to see if that gives a more accurate run.
More Answers (1)
Steven Lord
on 16 Jun 2022
Have you tried using the parallel profiler to determine what percentage of the time taken by the parfor code is spent on the actual computations and how much on overhead? You could try comparing comparing those parallel profiling results with the results of running the for loop version of the code in the MATLAB Profiler.
In order for your code to gain time when run in parallel, the amount of time you spend in the parallel setup and other overhead must be less than the amount of time that you save by running the iterations in parallel. If your overhead is high and/or the amount of time you save is small, your parfor loop could very well take more time to run than your for loop.
Think of grocery shopping with kids. If sending them over to the cereal aisle saves you two minutes but you have to spend five minutes searching the store to find them afterwards (eventually finding them in the candy or snacks aisle) you would be better off just going to the cereal aisle yourself.
3 Comments
Torsten
on 16 Jun 2022
Just out of curiosity:
What if you remove the settings
resultsMat(k) = singleResult;
timeMat(k) = theTime;
in the parfor loop ?
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!