How do I prevent my Redhat instance from losing SSH access when running spmd or parpool using more than N workers?
1 view (last 30 days)
Show older comments
MathWorks Support Team
on 31 Oct 2023
Answered: MathWorks Support Team
on 2 Nov 2023
I am trying to run a few simple lines of code that creates a local cluster, runs "spmd", performs a task in a "while" loop and then breaks out of the "while" loop. However, my code fails to run on any of my compute instances if I use more than 44 workers. This is how my code looks:
pclus = parcluster('local');
nc = 64;
parpool(pclus,nc);
i = 0;
while i < 100
spmd
% perform some task here
end
end
After "ii = 38" or so, my other shell instance running "top", "htop", "ls" and other commands fail. Right after this, I lose SSH access to my instance as well. Rescale, the platform on which my instance runs, prints out that this is an "unhealthy instance" and shuts it down.
During this time, I observed that MATLAB continues running. Why does this happen?
Accepted Answer
MathWorks Support Team
on 31 Oct 2023
This issue occurs because of the maximum user processes "ulimit". On Linux instances, it is possible to change the maximum user processes "ulimit" using the command:
ulimit -u 16384
However, this would only change the "ulimit" for that specific shell instance. To change the "ulimit" for all future shell instances as well, you will need to change the maximum user processes "ulimit" in the file "/etc/security/limits.d/20-nproc.conf".
The default value set in your instance must be enough to handle the number of MATLAB processes opened by 44 workers. This is the reason that MATLAB continues to run despite the instance itself losing SSH access.
0 Comments
More Answers (0)
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!