Simple parfor loop slow

3 views (last 30 days)
Manuel Santana
Manuel Santana on 29 Aug 2024
Commented: Sam Marshalik on 31 Aug 2024
In my code I am running a parfor loop which has to compute many matrix inversions of a moderately sized matrix. Below is some code which captures the essenence of my code.
nps = 10; foo = zeros(nps,1);
u = rand(1,6000,'like',1i); v = rand(6000,1,'like',1i);
mat = rand(6000,6000,'like',1i);
tic
parfor (ii = 1:nps,10)
foo(ii) = u * (mat \ v);
end
partime = toc
tic
for ii = 1:nps
foo(ii) = u * (mat \ v);
end
sertime = toc
For some reason the parfor loop is slower than the serial loop. For example with nps = 10 I get sertime = 11.2867, partime = 20.7321. If I inecrease to nps = 100 then sertime = 111.8209, partime = 126.8961. Note, I am running this code on a cluster using matlab parallel server using a slurm profile with 10 workers, (allowing more threads avaliable to each worker didn't help either).
Any thoughts on why the parfor loop doesn't provide the speedup expected?
As a side note in my actual code the matrix changes every loop iteration, but the above code still captures the bahavior I cannot explain.

Accepted Answer

Sam Marshalik
Sam Marshalik on 31 Aug 2024
I don't think ThreadPool will help here. I ran the code in a Process pool and a Thread pool and the runtime was somewhat similar. I also double checked how much data is being sent between the MATLAB client and workers and it is not a lot:
BytesSentToWorkers BytesReceivedFromWorkers
__________________ ________________________
1 576198886.00 614.00
2 576198886.00 614.00
3 576198886.00 614.00
4 576198886.00 614.00
5 576198886.00 614.00
6 576199557.00 1081.00
7 576199557.00 1081.00
8 576198886.00 614.00
Total 4609592430.00 5846.00
ThreadPool can certainly help when working with large data, but I do not think it is the culprit here.
I think the culprit is multi-threading. Running the code serial took me 33 seconds and running it on a single worker with no multi-threading took 78 seconds. This means that some multi-threading is happening behind the scenes.
I think you had the right idea of giving your parallel workers access to more threads. For example, in serial the code took 33 seconds. I then started a single worker and gave it access to 8 threads and that ran in 38 seconds (5 seconds for overhead is reasonable). I think as the problem scales up and you can have more workers with more threads you will get more of a benefit from MATLAB Parallel Server.
P.S. you may want to explore using sliced input variables as your data gets larger, so you can send chunks of data to the workers instead of the entire matrix/array.

More Answers (1)

Ronit
Ronit on 30 Aug 2024
Hello Manuel,
Since you are working on large complex data, and 10 MATLAB workers, the data must be copied to each of the workers, and the results must be copied back. This takes time.
I would suggest that you set up your workers to be threads, not separate processes. In this way, they use shared memory and data doesn’t need copying. You can do this with parpool(“threads”). This will significantly reduce the execution time of parfor loop.
Please refer to the documentation link of Run MATLAB Functions in Thread-Based Environment for more information:
I hope it helps with your query!
  3 Comments
Manuel Santana
Manuel Santana on 30 Aug 2024
Great thanks! I found that more threads and scaling the problem up did help increase the runtime as I expected. If you repost this reply as an answer I will accept it.

Sign in to comment.

Categories

Find more on Parallel Computing Fundamentals in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!