continuing a script after interruption

87 views (last 30 days)
Hi,
Imagine you have a big script running, which takes a very long while to execute.
Then, you need the machine to go down.
You hit CTRL+C, which interrupts the script, and you note the line on which your script was interrupted.
You type "save", which saves everything
the machine reboots, you launch matlab, and "load" back your stuff
the question is : is there a way to start the script from the line it was interrupted ?? (which may be INSIDE a for loop!) Since you have all the variables up and at correct values, it should be possible, but I don't see how to do it...
can someone help me ?
thanks
antoine

Accepted Answer

Walter Roberson
Walter Roberson on 6 Jul 2012
No, MATLAB does not have any method of "checkpointing" program state and restarting it later. You would need to write in the functionality yourself.
Sometimes the easiest way to achieve checkpointing is to write in "transaction processing" style with a State Machine. Instead of having series of loops and routines that call routines that call routines, etc., you flatten out the structure. You have a variable that contains information about the current major and minor "state" (what is being done now), and the routine branches on the major state to blocks of code that know about minor states, and so on, eventually getting to a code location where the current state is fully decoded, at which point the code calls a routine that does just enough work to change state, with the new state being returned upward. And then you go through the next iteration, the state is decoded all over again, and so on. When enough work has been done to be "worthwhile", perhaps when the code figures out that it is at the end of a substate, the code writes out all the information about the current state and current variables.
With this structure, continuing becomes easy: just restore the last saved state with variables, and feed that into the state machine. The state machine won't even notice that there was a downtime in-between (but you might have to re-open files you were working with.)
There are some kinds of problems where a State Machine works out very naturally, compact and efficient and flexible. It is, however, not a programming structure that most people are comfortable with.

More Answers (5)

John Petersen
John Petersen on 5 Jul 2012
At various points (places that take a long time to execute) you could add tests that determine if you need to execute that portion of the code. For example, you could set flags or test for created variables. For loops, you could test the loop variable to see if it exists and if it does to initialize it with the previous value. If you have more than one loop, make sure you use unique variables to identify each loop correctly. You should clear all variables before loading the old data file.
  3 Comments
Antoine Liutkus
Antoine Liutkus on 6 Jul 2012
yes, I am in a loop
thanks for your answers

Sign in to comment.


Antoine Liutkus
Antoine Liutkus on 6 Jul 2012
OK, so basically it's not possible..
I think it's a pity
thanks for your answers anyways
  1 Comment
Walter Roberson
Walter Roberson on 6 Jul 2012
It is one of those issues that turns out to be substantially more difficult than is obvious, and gets more difficult and complicated still as you increase the software and hardware optimization.
For example, consider a simple sequence such as
A = rand(); %or any non-constant value
A = A + 1.5;
A = A * 17.2;
The straight-forward interpretation is
Calculate a value into floating point register #1 (FP1)
Store FP1 into the memory location for A
Load from the memory location for A and store in FP1.
Add 1.5 to FP1 storing into FP1
Store FP1 into the memory location for A
Load from the memory location for A and store in FP1
Multiply FP1 by 17.2 storing into FP1
Store FP1 into the memory location for A
With this structure if you know how far you got and you have the current memory location for A, you only have to "back up" at most one instruction (re-load FP1 from A) to continue on.
But the above is inefficient. Must more efficient would be
Calculate a value into floating point register #1 (FP1)
Add 1.5 to FP1 storing into FP1
Multiply FP1 by 17.2 storing into FP1
Store FP1 into the memory location for A
But if you do this kind of optimization, you have to keep careful track of where you got to. As the calculations get more complex, involving more variables, you will often run into situations where you end up storing some of the variables into memory locations but keeping other variables in the floating point registers for efficiency, and you will end up doing so in patterns that make it impossible to restart by backing up, unless the code took the care to remember the "original" values of each affected memory location.
Now go multi-cored, hyper-threaded, GPU, add in I/O calls that should not be duplicated ("Deduct $10000 from bank account. Now do it again because I'm not sure whether it went through before or not.")

Sign in to comment.


Darin
Darin on 8 Jan 2018
Hi All,
I'd really like to have some "stop and continue" built into Matlab myself (another deficiency vs. Mathematica... for those who just say it can't be done), but have built some tools to work around the lack.
Take a look a Darin_dbtools on file exchange. If you make a habit of calling break_place_button at the start of your code, and break_place from within loops or between heavy compute sections of your code, you will have the possibility of interactively stopping and continuing when you didn't specifically plan to in advance. Other included tools let you add calculations or display to conditional break points, so that you can change how the code runs while in debug and when it continues.
of course... it you don't remember to put these hooks in up front... you're sunk.
Sorry, not everything you wanted, but close enough to help me retain my sanity is debugging some very long-running programs.
I've ask Mathworks to consider implementing a "next" option for the line # in dbstop, which would allow this to work without explicit calls to break_place in your code... but no joy as of yet.
  1 Comment
Mdaatniel
Mdaatniel on 30 Jan 2019
If you want next, you can use the matlab instruction dbstep
If you want matlab to stop at a given line number (say, 42), just use dbstop filename.m 42
And when it stopped at line 42, type dbclear 42
Thanks for the reference to Darin_dbtools

Sign in to comment.


Arunkumar M
Arunkumar M on 8 Nov 2018
You can use try,catch statements for this specific problem.
I have often used this in my scripts which handles analysis of data of multiple files. In this case, if data of one particular file causes an interruption, the script continues execution of analysis for rest of the files.
Example:
for i = 1:length(FileList)
try
% Analysis of file data for iteration number i
% code for analysis of file data
catch
% continue execution for rest of files
continue;
end
end

Mdaatniel
Mdaatniel on 30 Jan 2019
Of course you can try. The following might work if you did not fiddle too much with matlab internals (handles, ...). Try these steps:
(1) note carefully the stack of called functions with dbstack.
(2) Optionnal step if you want to try to restore figures, and if your code needed breakpoints: save any figures with saveas, and also what dbstatus returns with save.
(3) Altern save and dbup until coming to "base workspace." Use enter immediately after each dbup until matlab fixes the relevant bug. Then, after the reboot, run matlab in a virtual machine (because you learned the lesson the hard way, didn't you ?).
(4) for last line shown by step (1), do this:
(4.1) type this, with 42 replaced by the number of instruction line to restore, and filename.m replaced by the filename in last line shown by step (1): dbstop filename.m 42
(4.2) load last .mat file.
(4.3) retype the exact function call that you used for initial interrupted run.
(4.4) it should stop at line 42. Type dbclear 42
(5) for all other lines of dbstack's output shown at step (1), starting from the bottom:
(5.1) type this, with 18 replaced by the number of instruction line to restore, and filename.m replaced by the filename in current line from dbstack's output at step (1): dbstop filename.m 18
(5.2) load the relevant .mat file
(5.3) type dbcont
(5.4) it should stop at line 18. Type dbclear 18.
(6) optional if you needed step (2): load each figures and the .mat file from dbstatus, and manually feed it to dbstop.
(7) type dbcont. If you are unhappy in love, this will work.
  1 Comment
Mdaatniel
Mdaatniel on 30 Jan 2019
This is not perfect, but you will spare much of the then unfinished loops.

Sign in to comment.

Categories

Find more on Debugging and Analysis in Help Center and File Exchange

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!