Problem with spmd when data sizes increase
1 view (last 30 days)
Show older comments
Hello,
Ive run into a problem with a script that im working on where when the matrices reach the size of 130x130 the script doesnt run (effectively hangs). With the "size" variable set to 100, the script runs fine. Change it to 130 and it doesnt run at all. The "disp" commands do not print, nothing happens.
Can anybody help with this issue?
Heres the code:
clear
close all
clc
%initialize input data
size = 100;
a = eye(size);
b = eye(size);
%check and make sure pool is closed (cant open multiple pools!)
if(matlabpool('size') > 0)
matlabpool close
end
matlabpool open
tic;
%concurrent environment
spmd
if(labindex == 1)
disp('lab1 start');
comp4 = labReceive('any',4);
comp3 = labReceive('any',3);
disp('lab1:comp6');
comp6 = comp4;
labSend(comp6,2,6);
comp5 = labReceive('any',5);
disp('lab1:comp9');
comp9 = comp6;
labSend(comp9,2,9);
disp('lab1 done');
elseif(labindex == 2)
disp('lab2 start');
disp('lab2:comp0');
comp0 = a;
disp('lab2:comp1');
comp1 = comp0;
disp('lab2:comp2');
comp2 = comp1;
disp('lab2:comp3');
comp3 = comp2;
labSend(comp3,1,3);
labSend(comp3,3,3);
comp4 = labReceive('any',4);
disp('lab2:comp5');
comp5 = comp3;
labSend(comp5,1,5);
comp6 = labReceive('any',6);
disp('lab2:comp8');
comp8 = comp5;
disp('lab2:comp10');
comp10 = comp8;
disp('lab2:comp11');
comp11 = comp0;
comp9 = labReceive('any',9);
disp('lab2:comp12');
comp12 = comp11;
disp('lab2:comp13');
comp13 = comp12;
disp('lab2:comp15');
comp15 = comp13;
comp7 = labReceive('any',7);
disp('lab2:comp14');
comp14 = comp12;
disp('lab2 done');
elseif(labindex == 3)
disp('lab3 start');
disp('lab3:comp4');
comp4 = b;
labSend(comp4,2,4);
labSend(comp4,1,4);
comp3 = labReceive('any',3);
disp('lab3:comp7');
comp7 = comp3;
labSend(comp7,2,7);
disp('lab3 done');
end
end
time = toc;
fprintf('Execution time: %f ms\n',time*1e3);
matlabpool close
2 Comments
Walter Roberson
on 26 Jan 2014
Which MATLAB version are you using? How much memory do you have on your system?
Accepted Answer
Edric Ellis
on 27 Jan 2014
The problem here is that labSend is permitted to block until the corresponding labReceive is posted if the message is "too large". In practice, the point at which labSend starts to block is defined by the underlying MPI implementation - MPICH2. This point is about 128kB - which corresponds to a 128x128 double matrix.
To fix this, you can take one of two approaches:
- If you can rework your problem to use a completely deterministic communication pattern, then labSendReceive is the best way to avoid locking up.
- If your communication pattern cannot be predicted, then you might try having a deterministic round of communication to enable everyone to agree who they need to communicate with - and then use labSendReceive. See example below for the sort of thing I mean
if isempty(gcp('nocreate'))
parpool('local', 4);
end
spmd
for idx = 1:10
if labindex == 1
% lab 1 is in control of the communication pattern. Pick a random permutation,
% but ensure no-one is trying to send to themselves.
while true
sendTo = randperm(numlabs);
if all(sendTo ~= 1:numlabs)
% Ok
break
end
end
labBroadcast(1, sendTo);
else
sendTo = labBroadcast(1);
end
% Each lab now has 'sendRecv' - a 2 x numlabs array where
% the first row defines who each lab should send to, and
% the second row defines who each lab should receive from.
myDestination = sendTo(labindex);
mySource = find(sendTo == labindex);
% Each lab makes a payload and exchanges it
payload = rand(130);
otherPayload = labSendReceive(myDestination, ...
mySource, ...
payload);
end
end
More Answers (0)
See Also
Categories
Find more on Startup and Shutdown in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!