Restart a parpool worker

Question

Raghavasimhan Thirunarayanan on 16 Jun 2020

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/548883-restart-a-parpool-worker

Answered: Edric Ellis on 16 Jun 2020

Hello,

When I run parfor, sometimes a worker terminates with some error and the simulation continues with the remaining workers. But is there a way to automatically restart the parpool worker without having to stop and relaunch the simulation? I am at my wits end as to how to achieve it.

Thanks

1 Comment
Show -1 older commentsHide -1 older comments

Mohammad Sami on 16 Jun 2020

See if this answer from Edric helps you.

https://www.mathworks.com/matlabcentral/answers/504693-how-to-restart-a-worker-in-parpool

Sign in to comment.

Sign in to answer this question.

Answer 1

Edric Ellis on 16 Jun 2020

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/548883-restart-a-parpool-worker#answer_451779

There's no simple way to do this when using parfor with parpool unfortunately. I can think of a couple of workarounds that might help, depending very much on how your problem is set up.

Firstly, you could try the "cluster parfor" approach where you don't launch a parpool at all, and instead let the cluster run the loop directly. This is described in the doc here: https://www.mathworks.com/help/parallel-computing/parforoptions.html (See the section "Run parfor on a Cluster Without a Parallel Pool"). This approach launches independent tasks on your cluster rather than a parallel pool. This will only get decent performance if the time taken to launch the workers for the independent tasks is not significant compared to the time taken to run the entire loop. If it works for you, this is highly likely to be the simplest approach.

Secondly, if you can restructure your code to use parfeval instead of parfor, you could check the NumWorkers property of the parallel pool while consuming results, and if it decreases, restart the pool. This would be a bunch more work because you'd need to keep track of the incomplete work, and you'd have to re-submit it.

A third approach might be to restructure your parfor loop to send its results back using a DataQueue . Also, by launching the parpool using the 'SpmdEnabled', true option, the pool will automatically shut down any time a worker crashes. Then, the idea would be that the client stores the partial results of your loop using the DataQueue. The parfor loop would terminate with an error when a worker crashes, but you'd have the partial results and therefore would be able to re-start a new pool, and run a parfor loop over the incomplete iterations.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Restart a parpool worker

1 Comment
Show -1 older commentsHide -1 older comments

Answers (1)

0 Comments
Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

Restart a parpool worker

1 Comment Show -1 older commentsHide -1 older comments

Answers (1)

0 Comments Show -2 older commentsHide -2 older comments

See Also

Categories

Tags

Products

Community Treasure Hunt

1 Comment
Show -1 older commentsHide -1 older comments

0 Comments
Show -2 older commentsHide -2 older comments