How do I distribute N 3-dimensional (large) arrays for processing across N workers?
9 views (last 30 days)
Chris Steenhoek on 8 Jun 2020
I would like to accelerate processing of a large set of radar event data using Parallel Computing. I have a server with 48-cores and 512GB of RAM so all of the data I need to process will fit into the local computer's memory with enough cores to process each independent set of events. The data I want each core to process consists of 8 channels of IQ data which is a matrix of S samples x P pulses -- i.e., each I would like to distribute an 8 x S x P matrix to each worker.
Currently the data is loaded from Nx8 files into an Nx8xSxP matrix which I would like to distribute to N workers. The file reading is actually quite slow since it is done by a single processor so perhaps the first question is whether or I could have each worker load their own Nx8 set of files.
Otherwise, how do I distribute each 8xSxP matrix to my workers?
Edric Ellis on 9 Jun 2020
The best approach probably depends on the operations you need to perform on this Nx8xSxP array. Are the operations that you wish to perform such that you can consider "slices" of the array independently? I.e. can each 8xSxP slice of the array be operated on independently? If so, you could consider an approach like this:
parfor i = 1:N
myData = zeros(8,S,P)
for f = 1:8
% Here, readData reads one file returning a matrix
% of size SxP
myData(f, :, :) = readData(i, f);
% Here, "compute" operates on 8xSxP array, giving some result
result(i) = compute(myData);
Even with this approach, be aware that the file reading might be slow because of the limitations of the disk hardware you're reading from. It this is a spinning disk, it might actually be counter-productive to try and have multiple workers attempting to read different files simultaneously.
If the operations you need to perform are not as easily "sliced" as in the example above, then it might be better to consider using "distributed arrays".