How does "warm-up" overhead scale with data size or iteration count?

Everyone knows that when an M-file is run the first time in a Matlab session it runs much slower than the following next runs. This "warm-up" effect is due to compiling the code with the accelerator and probably many other things that I don't understand. But we all know to discard the first (or first few) results when timing the performance of a script.
My question is, is the warm-up time purely a constant overhead, or might long running scripts suffer from it too. In other words, if I am running a long complicated script, either on large data files or with a many iteration loop, should I still "exercise" the code on a smaller problem before running? If so, will a
clear all
ruin the effort?

Answers (1)

clear all will ruin all previous "warm-up".
The warm-up should only need to be done once per function. However if the calls you make to warm up the function do not happen to invoke all the sub-functions then those might not be JIT'd.
I do not know whether JIT does all auxiliary functions in the same file when the main function is done. I would lean towards suspecting it does not JIT functions until they are needed.
I have no idea of the time at which methods in a classdef are JIT'd.

9 Comments

I've measured the parsing of a medium size M-file (1400 lines of code) with some subfunction. It seems like the JIT-parsing is done directly after reading the file for all subfunctions, independent from the fact, if the subfunctions are called or not. P-coding in v7 style does not reduce the warm-up time in my measurements. This could be an artifact caused by the overhead required for decrypting and unpacking the file.
Anyhow, Walter's main statement is true and exhaustive: clear all ruins the previous warm-up. +1
@Walter
Not always once per function (at least not in 2008):
"... JIT is sometimes multi-pass ... i.e., each time you run the m-file the JIT will do a little more optimization, so run times can vary from run to run because of this also"
From Steven Lord's comment at
My measurements have been crude only:
clear all; tic; runTheCommand; drawnow; toc
tic; runTheCommand; drawnow; toc
The 1400 lines M-file creates a GUI and it contains very small loops only (e.g. create 8 buttons). The main time is spend to create the GUI objects, but for the measurements only trhe difference between the times matter. In addition I've checked this after a restart of the machine, when the M-file is not contained in the harddisk cache already. The code does not profit from the JIT substantially (checked by "feature jit off; feature accel off").
Thanks, Malcolm. I do remember that thread, but I have been overlooking that particular comment of Steve's.
@Jan It sounds like that code's main overhead is in the JVM. I doubt the MATLAB JIT will affect that. MATLAB JIT and Hotspot VM JIT are entirely separate (AFAIK).
@Malcolm: The timings remain the nearly the same, when I clear() only all user-defined M-functions instead of the brute clear('all'). Therefore I guess it is the overhead for reading and parsing the M-code. Even the analysis that the JIT does not need to optimize will need some time. Anyhow, deeper investigations are not interesting at least for me: as long as I can avoid "clear('all')", the first run of a function in a Matlab session is and will be slower.
Thanks for all the interesting info. Part of my original question remains unanswered though. Maybe because it was unclear. Does the warm-up effect scale with data size too, rather then code size. For example. If I write some physical system simulation and want to time it, I know that:
clear all
numTimeSteps=1;
tic
SimulateSystem(numTimeSteps);
t1=toc;
tic
SimulateSystem(numTimeSteps);
t2=toc;
t2 is now likely much less then t1. However, if:
clear all
numTimeSteps=1e6;
tic, SimulateSystem(numTimeSteps); t3=toc/1e6;
is t3 closer to t2 or to t1? Does the answer depend on the complexity of SimulateSystem()?
thanks, -n
Given the quotation about incremental JIT, I would suspect that Yes, data size does matter.
There are a number of MATLAB operations or code patterns which MATLAB knows how to implement in terms of calls to LAPACK and similar highly optimized (and multi-threaded) routines. There is, though, overhead in repackaging the inputs for the routines and unpackaging the outputs from the routines (the routines do not use the same storage order conventions that MATLAB does.) MATLAB holds off on calling the routines until the problem size is big enough that even including those overheads the library routines will be faster.
I figure then that if you were to exercise the code with a "small" dataset, then that dataset might not be large enough for MATLAB to decide to call out to those routines, and thus that the large-problem code might not get JIT'd into place until the code is run with a sufficiently large problem.
Unfortunately "how big" is something we do not know. "About 10,000 elements" for simple vectorized routines. Possibly much much smaller for routines that do complicated calculations. Mathworks does not document the breakpoints, and the breakpoints change between releases (and possibly even according to processor details as optimization advantage is processor-specific.)
@Walter
I just spotted that the multi-pass comment was from James Tursa not Steven Lord. For the comments above:
"storage order conventions" are the same for LAPACK/BLAS routines (Fortran base - column major). These are already heavily optimized and can not benefit from JIT. Neither can any MATLAB built-ins/mex-files as I understand it so vectorized code will not benefit from JIT either. The biggest hit there is because of copy-by-value passing to Java and matrix creation for the LHS with mex (using a pointer from the RHS to return results instead speeds up code no-end with large matrices but has risks - see http://undocumentedmatlab.com/blog/matlab-mex-in-place-editing/.
Storage order remains important (for vectorized as well as non-vectorized code) because accessing data in a continuous block will increase the chance of operations being done in cache (see http://www.mathworks.co.uk/company/newsletters/news_notes/june07/patterns.html). So the order of indexing in loops remains an issue (whether JIT optimizes those I do not know - if it does the returned results would change due to IEEE rounding).
With no documentation we can only guess at the factors MATLAB-JIT uses. The Hotspot compiler switches give a clue to what factors any JIT system might consider ( http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html)

Sign in to comment.

Categories

Find more on Parallel Computing in Help Center and File Exchange

Asked:

on 30 May 2012

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!