Error using gop -> Error detected on worker N -> Error during serialization
3 views (last 30 days)
Show older comments
I'm receiving an error running distributed code on a cluster;
Error using gop (line 75)
Error detected on worker 5.
Error during serialization
Error in gplus (line 24)
y = gop(@plus, x, labTarget);
Error in hamiltonian>(spmd body) (line 654)
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
Error in hamiltonian>(spmd) (line 611)
spmd
Error in hamiltonian (line 611)
spmd
Error in relaxation (line 111)
[L0,Q]=hamiltonian(assume(spin_system,'labframe'));
Error in decoherence_naphthalene (line 53)
R=relaxation(spin_system);
}
The code within the spmd block is;
spmd
% Localize the problem at the nodes
partition=codistributor1d.defaultPartition(nterms);
codistrib=codistributor1d(1,partition,[nterms 1]);
local_terms=getLocalPart(codistributed((1:nterms)',codistrib));
% Preallocate the local Hamiltonian
H=mprealloc(spin_system,1);
if build_aniso
Q=cell(5,5);
for m=1:5
for k=1:5
Q{k,m}=mprealloc(spin_system,1);
end
end
end
% Build the local Hamiltonian
for n=local_terms'
% Compute operator from the specification
if descr.S(n)==0
oper=operator(spin_system,descr.opL(n),{descr.L(n)},operator_type);
else
oper=operator(spin_system,[descr.opL(n),descr.opS(n)],{descr.L(n),descr.S(n)},operator_type);
end
% Add to relevant local arrays
H=H+descr.H(n)*oper;
if build_aniso
for m=1:5
for k=1:5
if abs(descr.T(n,k)*descr.phi(n,m))>spin_system.tols.inter_cutoff
Q{k,m}=Q{k,m}+descr.T(n,k)*descr.phi(n,m)*oper;
end
end
end
end
end
% Collect the result
H=gplus(H,1); if build_aniso, Q=gplus(Q,1); end
end
Matlab seems to have no problems starting and connecting to the pool of workers;
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
Starting...
Starting parallel pool (parpool) using the 'local' profile ... connected to 12 workers.
[decoherence_naphthalene > create ] Spinach root directory determined to be /home/******/kec30/spinach_1.4.2114
Some background: I'm running this through an SGE queuing system (although I get the same problem running interactively). It always returns the error with "worker N" where N < # cores in parpool.
I'm running these simulations on Matlab R2014a (8.3.0.532) 64-bit (glnxa64) using code provided in spinach 1.4.2114. It's a 10 spin system (4^n matrix elements), and the code appears to start on my laptop but quickly maxes out my ram (8 GB, about 4 available for Matlab). On the cluster, I've tried reducing the memory dramatically by going from the full system (10 spin-coherences) to a greatly reduced system (3-spin coherences, with a distance cutoff) (which is probably to few), but I still encounter the same problem. Since this seems to be more of a Matlab error than a spinach error, I thought I'd ask here for advice.
2 Comments
Edric Ellis
on 8 Aug 2014
What is the type of 'H'? Can it be saved to a MAT file successfully (this is required as the communication system uses the same mechanism when transferring data)? Also, I note that Q is a cell array - I would not expect gplus(Q,1) to succeed as that's trying to add together each worker's version of 'Q' (but I think that would give you a different error message).
Answers (0)
See Also
Categories
Find more on Parallel and Cloud in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!