Best Practice to Distribute Data to Workers?

2 views (last 30 days)
Hi,
I wonder if there is any known best practice to distribute data from the client to workers in terms of time (and space) efficiency. Suppose we have a large matrix A on the client, and want to distribute it to workers (along the column). Suppose
  • A is the result of some complicated operations, so we can't generate columns (or rows) of A parallelly on each worker
  • A can be fitted into the memory (not datastore type needed)
I wonder what would be the best practice to distribute A on workers.
I made the following comparison:
n = 512;
n_workers = 25;
A = rand(n^2, n); % generate synthesized data A
% method 1: distributed
tic;
A_dist = distributed(A);
t1=toc;
fprintf("t1 = %7.4e\n", t1)
clear A_dist
% method 2: Composite -> distributed
tic;
A_dist = Composite();
chunk_size = ceil(n/n_workers);
for i = 1 : n_workers-1
A_dist{i} = A(:,chunk_size*(i-1)+1:chunk_size*i);
end
A_dist{n_workers} = A(:,chunk_size*(n_workers-1)+1:end);
A_dist = distributed(A_dist, 2);
t2=toc;
fprintf("t2 = %7.4e\n", t2)
clear A_dist
% method 3: spmd + codistributed
tic;
spmd
A_dist = codistributed(A, codistributor('1d', 2));
end
t3=toc;
fprintf("t3 = %7.4e\n", t3)
clear A_dist
I observe that method 2 is always faster than method 1, and they two are both significantly faster than method 3. The typical output is: (and the rank and the gap are quite robust)
t1 = 3.0949e+00
t2 = 2.2290e+00
t3 = 1.7517e+01
Is there any better way than my method 2?
Besides, I am wondering about the mirror question: what would be a best pratice to gather data from workers to client? Basically it should be an inverse of my code that gets a (large) matrix A from distributed array A_dist.
  4 Comments
Edric Ellis
Edric Ellis on 8 Dec 2021
parfor is probably fastest since it can send slices of data to multiple workers simultaneously. Unfortunately, using parfor is not useful for creating a distributed array since you don't have control over where the data ends up. (Ideally the distributed constructor would do this too, but I think the current implementation doesn't).

Sign in to comment.

Answers (0)

Categories

Find more on Parallel for-Loops (parfor) in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!