mex and parallel

2 views (last 30 days)
Xiaochun
Xiaochun on 6 Jan 2012
hi,
I have pc with 8 cores.
I test parallel coomputing speedup for mex; when i use serial process, it cost 300 second; when using spmd for 4 worker, it needs nearly 500 seconds; And for 8 workers, it cost nearly 1000 seconds? What's going on? From the task manager, I find nearly half of cpu usage is for system process. is it because of mex interface?
spmd
%%to Intialize mex inputs,x1,x2,x3(there are very larger array, one of them nearly 50M)
:
%%intializing end
tic;
test(x1,x2,x3,x4);% the test codes running time in each worker;
toc;
end
Michael

Answers (3)

Jason Ross
Jason Ross on 6 Jan 2012
  • Do you really have 8 cores, or do you have four cores and hyper threading?
  • Are you sure in your test that you are comparing the same amount of work? Does the serial test do the same thing eight times (or four times, depending on what you are comparing)?
  • (note: I'm assuming you are on Windows) If you look at the Resource Monitor (not just Task Manager), are you taking up all the RAM on the system and using virtual memory? That will certainly impair performance.
  1 Comment
Xiaochun
Xiaochun on 8 Jan 2012
1) Our pc have 8 cores and 64G RAM. So I think it is not memory and hyper threading issue.
2) all the test input x1, x2 and x3 is the same in parallel and serial computing. And the computing load is the same for parallel and serial computing.

Sign in to comment.


Titus Edelhofer
Titus Edelhofer on 6 Jan 2012
Hi Xiaochun,
I'm not sure why you want to compare the code with spmd and without. Running the same code within spmd will always be slower (or at best the same speed) because you do the same thing 4 or 8 times ...?
If the matrices are large you might see worse performance due to caching phenomena: for one "worker" (the client, no spmd) the matrices might fit into second level cache, but 8 times the matrices don't.
So again: what do you expect or want to see with your test above?
Titus
  2 Comments
Xiaochun
Xiaochun on 7 Jan 2012
hi Titus,
Thank you for your reply.
1) the "test" subroutine is mex file compiled from C.In my research work, I need to run it many times with different input datas,and it is easy to use spmd to run it. For example, I have to run it 8 times. If I use 8 worker, I can run it all at same time, instead of 8 times in serial computing.
2) the computing time of "test" subroutine in the workers may be some increase; but it slows down too much: a) one worker is nearly the same as the serial computing,300 Seconds;b) for four workers spmd, it is nearly 500seconds;c) for 8 workers, it costs 1000seconds.
3) I have seen nealy half of cpu usage is comsumed by system process during parallel. it is so strange, and is it related to mex?
Titus Edelhofer
Titus Edelhofer on 9 Jan 2012
OK, I understand. So the x1 etc are different per worker (apart from this test here)? So you end up with a speed up of 2.4 for 4 workers (4 / (500s/300s)) and a speedup of 2.4 for 8 workers. The 4 workers are O.K. I guess whereas the speedup for 8 workers is indeed poor. The mex file should not be the problem (once you are inside the mex file there is no overhead there). Threading should not be the problem either (since 1 worker and without spmd takes the same time). My suspicion would be memory access (the workers/cores need to wait for memory I/O limiting the speed up).

Sign in to comment.


Walter Roberson
Walter Roberson on 7 Jan 2012
It seems fairly unlikely to me that it would be related to mex (but there might be an interaction I am not familiar with.)
Please remember that smpd must send the data to each of the workers, as discussed in your previous thread http://www.mathworks.com/matlabcentral/answers/25014-about-parfor-and-spmd-speedup . Are you using the Worker Object Wrapper you were referred to in another of your threads, http://www.mathworks.com/matlabcentral/answers/24862-about-the-parfor
  1 Comment
Xiaochun
Xiaochun on 7 Jan 2012
ye,I have use Worker Object Wrapper. But it increase computing speed a little.
To avoid to communication related to sending data and receiving data between the worker and client, I have made the data initialiaze in each worker(I do not include this time for comparation).
Thank you for your comments.

Sign in to comment.

Categories

Find more on Parallel Computing in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!