Clear Filters
Clear Filters

Best read-only data strategy for parfor

1 view (last 30 days)
Robin on 18 Oct 2012
I am using parfor on a grid with 60 workers.
I have some data which will be used read-only within the parfor loop.
I see that there are two options... load it on the machine I am submitting from so it is serialized and sent across the network (dedicated gigE for the cluster), or load it from disk within the loop.
Can anyone comment on which of these might be the best strategy for different data sizes? The data compresses very well so is about 20MB on disk but more than 1GB on in memory when loaded. What is the speed of loading and uncompressing in comparison to serialisation?
If I have it loaded on the submission machine, is matlab clever enough to serialize and send once to each worker or will it repeat it on every iteration. Obviously loading from a file would be done every iteration.
Any advice appreciated

Answers (1)

Edric Ellis
Edric Ellis on 18 Oct 2012
I would recommend trying my Worker Object Wrapper. It's designed for just this sort of situation. In your case, you should put the files in a location available to the workers, and have them load the data using something like this:
w = WorkerObjectWrapper( @loadHugeData );
The object 'w' is then effectively a handle to the data. When you pass this into a PARFOR loop, the workers can then access the underlying data, like so:
parfor ii = 1:N
doSomethingWith( w.Value );


Find more on Parallel for-Loops (parfor) in Help Center and File Exchange



Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!