When does it happens the data-copy, when executing a parallel job?
1 view (last 30 days)
Show older comments
Hello,
when I call a function that I want to run in parallel (for example, inside a parfor statement), and this function receives some data as inputs, are these data copied to each (RAM-memory partition of the) worker just after calling the matlabpool open command, or each time, e.g., a parfor block is executed? Thank you in advance.
0 Comments
Accepted Answer
Matt J
on 12 Dec 2013
Edited: Matt J
on 12 Dec 2013
Normally data copies are broadcast each time a parfor block is executed, but this FEX file offers a way to make data persist from one parfor block to subsequent ones,
3 Comments
Matt J
on 12 Dec 2013
Giovanni De Luca Commented
Hi Matt, thank you for this satisfactory answer. At this point, I have the following comment, I would like to know your answer. Assuming we are working in a SHARED-MEMORY, LOCAL computer, and with large-matrices (with dimension greater than [10^4x10^4]). The WorkerObjWrapper function avoids to re-create these data (which are referenced inside multiple parfor blocks), which may be quite time-consuming. Well, I'm working with dense and sparse matrices, of order up to 10^5 (for the sparse case), and both with the WorkerObjWrapper function and without it, and it seems the CPU times are not so differents in the two cases (with and without). Instead, when working with, e.g., a cluster of distributed machines (fortunately, I have access to one of it), then with a DISTRIBUTED-MEMORY architecture, the WorkerObjWrapper function seems to be very useful, since, before I let the parallel workers do the computations, I send all my large data to each one of them, using the WorkerObjWrapper to persistently store the data inside the RAM of each workers (thus, also avoiding multiple data transfers), and the speedup factor increases! I think this function could be useful also for DISTRIBUTED-MEMORY LOCAL desktops, where each processor has its own RAM. What do you think about it? In particular, I'm searching for the reason of the slowdown I obtained for my embarrassingly problem with large data sets: as much as I increase the number of workers, the slowdown increases too. Since I'm working with a shared-memory local desktop, I think it's due to the bus-contention, but I think I should be more detailed...
Matt J
on 13 Dec 2013
I think this function could be useful also for DISTRIBUTED-MEMORY LOCAL desktops, where each processor has its own RAM. What do you think about it?
I'm by no means an expert, but avoiding unnecessary rebroadcasting of data seems like it should be a good idea independently of the architecture.
More Answers (0)
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!