Distributed computing - job still 'running' when actually finished. Why?

8 views (last 30 days)
I have been using MATLAB distributed computing for about two years now (R2014b). I have noticed multiple issues and am wondering what the cause could be. I am not sure if they are related or not.
  1. The MATLAB GUI that lists jobs and their status hangs while trying to update status, and makes MATLAB freeze. I stopped using it because of that, but it seems like there should be a fix.
  2. [Behavior until this last month] When I used the MATLAB terminal to check on the status of jobs, they will still be listed as 'running' despite the job being finished on the server side - I could ssh into our server and see that my job was no longer in the queue, but MATLAB would still think the job was running. When I ssh into the job folder on the server, the output file Task1.out.mat in the appropriate job folder would be full of my outputs, but the Job#.out.mat file outside the folder is empty.
  3. [Behavior this last month] Now when I use the MATLAB terminal to check on the status of jobs, sometimes MATLAB just hangs forever or tells me that the job is still running. If I ssh into my server, I can see that the tasks are still listed as running in the queue, but in some cases I can see that the Task1.out.mat file has been written and has my outputs, despite the process still running. I now need to check the size of that output file to determine if my job has finished and then manually cancel the jobs to get them to stop - they won't do it on their own.
  4. [Behavior this last month] If I use the diary() command on a job that I can tell has finished because all the outputs are there in Task1.out.mat, the diary is incomplete. For example one job I submitted had a for i = 1:10000 loop that I had print every 100 iterations. When I check the diary it stops at 6500 but when I pull the output from the Task1.out.mat file, the array it was generating is full.

Answers (1)

Sean de Wolski
Sean de Wolski on 19 Jan 2018
For 1) It's gotten a lot better and stable in more recent releases.
  2 Comments
EQ
EQ on 19 Jan 2018
Good to know; maybe I will try persuading the powers that be that it's time to upgrade. Any thoughts on 2-4?

Sign in to comment.

Categories

Find more on MATLAB Parallel Server in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!