How does the client send data to the workes, when calling parfor?

2 views (last 30 days)
Hi, let assume I want to use the parfor with my function (in this case, it just computes the routine without retrieving data from the workers, except for the last operation), i.e.:
parfor i=1:n
err = my_fnc(A,B,exp_pts(i));
where my_fnc is a time-expensive routine, exp_pts is a vector of n different numbers, A and B are always the same, very large matrices. I would know how the client transfers these data to all the available workers, that's to say: does it transfer data first to the worker 1, after to the worker 2, and so on? If it does, it means that, when working with very large (and dense) data, worker 1 starts working first and it could becomes idle when worker 2 is working, and so on.. (it's like an unbalanced load). Also, with matrices of this type, it's hard for the "true" operational time to overcome the time needed for the transfer of data, then the parfor is not so useful for these data. Do you agree? Thank you in advance.

Accepted Answer

Edric Ellis
Edric Ellis on 20 Aug 2013
A and B are sent once to each worker at the start of the PARFOR loop. There are certainly cases where the data transfer time dominates the computational time - but if my_fnc is truly time-expensive, this should not be the case. If you have multiple PARFOR loops using the same 'A' and 'B', you could use my Worker Object Wrapper to ensure they are transferred only once.
  1 Comment
Giovanni De Luca
Giovanni De Luca on 27 Aug 2013
Hi Edric,
thank you for your file and suggestion. Well, I used the WorkerObjWrapper.m file with both a powerful cluster of computers and a local multicore desktop: in the first case, the WorkerObjWrapper has been useful only for DENSE data set (for the sparse counterpart, I obtained higher simulation times respect to the case with no usage of your file), and I obtained simulation times halved respect those without this file. However, my code is something like this:
while true
% make something inexpensive
parfor i=1:n
err = my_fnc(A,B,exp_pts(i));
% make something else inexpensive (respect to my_fnc)
and the while-cycle is repeated a certain number of time. Also, it could be considered "multiple parfor loops" the fact that I execute my_fnc n-times, where n is greater than the number of available workers. Right? Then, it should be useful to use your file also in this case. Besides, when I call matlabpool open 50 (equals to n, the limit of the parfor loop), I obtain a simulation time, for a dense data set, greater than when I call matlabpool open 25, then it's seems that calling a smaller number of workers is better. Why? I send just once the data to the workers and using a higher number of them (smaller or equals to the bound of the parfor) should be better.. Instead, for a local multicore desktop I used your file and, independently of the data set (sparse or dense) this file seems to be not useful in terms of simulation time, obviously for my case, where I work with very large size data set. Then, I wonder which are the procedures Matlab does when one calls the parfor command in both cluster and local multicore (i.e., when working with local multicore, one called the parfor, Matlab partitions the available amount of RAM for each workers, then it sends the (large) data to each one, and so on..Right?; or in a cluster, it doesn't need to partition the memory since each node of the cluster has its own local RAM, then Matlab just sends data..). I mean, there should be other bottlenecks since I cannot reach a good speed up (even if there is the Amdahl's law), that's to say other routines Matlab calls for the startup of the parfor loop. Can you explain how the parfor loop works in these terms?

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!