distributed arrays slow with batch jobs
1 view (last 30 days)
Show older comments
Hi,
I am working with distributed arrays.
As far as I understood, I can create distributed arrays directly on a cluster. When I want to manipulate what is inside the distributed array, I need to use spmd.
I wanted to avoid any interactive pool. For this reason, I created a function that uses a distributed array, and send it to the cluster as a batch job. The function looks like
function R = my_distributed_function(input)
R = eye(N,'distributed' );
for k = 1 : N
for m = 1 :N
R(k,m) = 1 *m;
end
end
And I send this to the cluster as a batch job
job_distributed = batch(c,@my_distributed_function,1,{myinput},'Pool',N-1,'CurrentFolder','.','AutoAddClientPath',false);
However, it takes very long, around 64 seconds. The function without the "distributed" takes around 2 ms.
If I do not use the batch job, but keep the "distributed" option, the interactive pool starts. Then of course, it takes around 2 seconds, but there is the time to start the parallel pool.
My question is : why the batch job takes so long if I use a function that uses distributed arrays?
0 Comments
Accepted Answer
Thomas Falch
on 29 Oct 2021
A batch job with the 'Pool' option ( a "batch-pool job") will end up starting the equivalent of a interactive pool, but using one of the workers as a substitute for the MATLAB desktop client. The overall time for such a job will therefore be pool startup + the acutall work you're doing. In other words, it will take about the same amount of time as an interactive pool.
The main benefit of an batch-pool job is that you can submit the job to the cluster, and then shut down the MATLAB desktop client (and indeed the computer it's running on). Meanwhile, the job is running on the cluster, and you can come back much later to get the results. This is useful for long running jobs which don't require any user input (which is what interactive pools are for).
3 Comments
Thomas Falch
on 1 Nov 2021
This happens whenever you use the 'Pool' option to batch() (or equivalently using createCommunicatingJob()).
If you use parfor with batch() without the 'Pool' option (or equivalently using createJob/createTask), it will probably not work as you expect. It will not cause any kind of pool to be opened, and it will basically run as a regular for loop (on a single worker of your cluster).
This is the same behavior you would get if you try to run a parfor loop in the MATLAB desktop client without a interactive pool open and you have disabled the option to start up a parpool when you encounter a parfor (or don't have the Parallel Computing Toolbox installed).
More Answers (0)
See Also
Categories
Find more on Parallel Computing Fundamentals in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!