How do I distribute N 3-dimensional (large) arrays for processing across N workers?

9 views (last 30 days)
I would like to accelerate processing of a large set of radar event data using Parallel Computing. I have a server with 48-cores and 512GB of RAM so all of the data I need to process will fit into the local computer's memory with enough cores to process each independent set of events. The data I want each core to process consists of 8 channels of IQ data which is a matrix of S samples x P pulses -- i.e., each I would like to distribute an 8 x S x P matrix to each worker.
Currently the data is loaded from Nx8 files into an Nx8xSxP matrix which I would like to distribute to N workers. The file reading is actually quite slow since it is done by a single processor so perhaps the first question is whether or I could have each worker load their own Nx8 set of files.
Otherwise, how do I distribute each 8xSxP matrix to my workers?
Chris Steenhoek
Chris Steenhoek on 12 Jun 2020
Edited: Chris Steenhoek on 12 Jun 2020
Thanks Walter. Great explanation on the memory considerations for the data organization. I certainly understand the need for the data to get created on the worker. This is the reason Edric's recommended approach "works" -- with the caveat of the disk controller limitations both he and you have pointed out (great info there too, btw).
I was hoping there was some way to allocate "shared" memory such that the workers would just get a pointer to their slice of the memory (rather than the data itself needing to be transfered to the workers before the parfor executes). The distributed/codistributed functions seem to almost be what I want but not quite.
I'll do some experiments on our server to see what my limitations are for having the workers load the data files within the parfor loop.

Sign in to comment.

Accepted Answer

Edric Ellis
Edric Ellis on 9 Jun 2020
The best approach probably depends on the operations you need to perform on this Nx8xSxP array. Are the operations that you wish to perform such that you can consider "slices" of the array independently? I.e. can each 8xSxP slice of the array be operated on independently? If so, you could consider an approach like this:
parfor i = 1:N
myData = zeros(8,S,P)
for f = 1:8
% Here, readData reads one file returning a matrix
% of size SxP
myData(f, :, :) = readData(i, f);
% Here, "compute" operates on 8xSxP array, giving some result
result(i) = compute(myData);
Even with this approach, be aware that the file reading might be slow because of the limitations of the disk hardware you're reading from. It this is a spinning disk, it might actually be counter-productive to try and have multiple workers attempting to read different files simultaneously.
If the operations you need to perform are not as easily "sliced" as in the example above, then it might be better to consider using "distributed arrays".
Chris Steenhoek
Chris Steenhoek on 12 Jun 2020
Thanks Edric. I had a bit of time on our server today and did a quick test where I reordered my matrix from NCSP to CSPN and did the file reads inside the parfor. I started slow with N=9 and it was loading 9x 280MB files in parallel with seemingly zero issues. There's a file for each channel for each event so with N=9 and C=8, that's 72 of these files. My load time reduced from 467 seconds to 45 seconds which is somehow actually greater than a factor of 9 reduction. I didn't get a chance to push this to see how far it will scale (my ultimate goal is N=45) but it is certainly promising.
With the SPMD approach you've provided as a backup plan I feel very good about the path I'm on. I greatly appreciate the explanations and well commented examples that both you and Walter have provided. I'll try to post an update once I get this all worked out. In my experiment wiht the first 9 events, I found that the code I'm working with isn't exactly memory effiencent so I need to add some memory management to keep within my 512GB. That will probably keep me busy for a few days.
Many thanks.

Sign in to comment.

More Answers (0)

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!