why a is function call from within a function much much slower than not doing it in a function.

76 views (last 30 days)
here is a simple code that shows it:
function c=myfunc(a,b)
c=a*b;
function c=main_temp()
a(1)=5;
a(2)=8;
r=1e7;
tic
for i=1:r
c=myfunc(a(1),a(2));
end
toc
tic
for i=1:r
c=a(1)*a(2);
end
toc
end
here is the timing using tic-toc
Elapsed time is 0.327171 seconds.
Elapsed time is 0.009264 seconds.
it's about 40 times slower (for no apparent reason except function overheads). and it gets much worse when using classes (about 15 times slower then the function call).
i searched extensively around the web and read every performance post and guidances related post.
also, this is a very very simple code that shows a core problem.
  2 Comments
Guillaume
Guillaume on 2 Apr 2019
As Jan says, your second for loop constantly overwrites the same variable. An efficient jit compiler may simple decide to execute just the last step of the loop since all the other steps are irrelevant. However, it can't do that if you call a function since it doesn't know what happens in the function.
To make sure that it's not what happens this would probably be a better test:
function main_temp()
r=1e7;
a = rand(1, r);
b = rand(1, r);
c = zeros(1, r);
tic
for i=1:r
c(i)=myfunc(a(i),b(i));
end
toc
d = zeros(1, r);
tic
for i=1:r
d(i)=a(i)*b(i);
end
toc
end
function c=myfunc(a,b)
c=a*b;
end
>> main_temp;
Elapsed time is 0.183169 seconds.
Elapsed time is 0.044830 seconds.
Still a big impact but not as big as what you show. Functions call do have an inherant overhead and will prevent some optimisation. The nitty-gritty of it is undocumented in matlab and subject to change from version to version.
Yair Yakoby
Yair Yakoby on 2 Apr 2019
Edited: Yair Yakoby on 3 Apr 2019
edited:
from different tests we can see that both Jan and you are correct about the JIT optimizations.
however, it's still 100-50 times slower using classes vs inline and 6-10 times slower using functions vs inline. it's still a lot in general and a lot more then python (which is as i said, about twice more for functions and classes then inline).
JIT can just cause a bigger gap.
it translates to around 10 seconds for 1e7 calls. i know that it may sound like not so much however when writing a complex OOP system this can account for (if we compare to python, unjustifted) hours.

Sign in to comment.

Accepted Answer

Jan
Jan on 2 Apr 2019
Edited: Jan on 2 Apr 2019
This is not a problem, but it shows that Matlab's JIT acceleration works powerful. With the inlined code it seems to recognize, that c is overwritten by the same values repeatedly. Maybe the JIT exploits, that the same indices are used, such that it does not access the array dynamically, but re-uses the values directly.
All this cannot happen, if the code to be evaluated is stored in an extra function. The smaller the function is, the less possibilities does the JIT have to accelerate the code.
Note that the JIT is not documented and subject to frequent changes. Therefore the idea, why exactly it works more powerful in the inlined code is an educated guess only.
Trying to improve such tiny artificial parts of the code is called "pre-mature optimization". This is a typical anti-pattern in programming. Prefer to write clean and working code and let the profiler find the bottlenecks at the end. In a real-world program you can observe the advantage of inlined code also, but the effect is usually smaller. Most of all, try not to create too tiny subfunctions.
Matlab's object oriented classes are known to be not so fast.
  4 Comments
Rob Hurt
Rob Hurt on 10 Sep 2021
Hi Jan,
I'm trying to teach myself the best programming practices, and I'd like to better understand to what extent I should use functions vs. scripts. It seems like you come out strongly in favor of functions in these two answers (https://www.mathworks.com/matlabcentral/answers/31963-functions-vs-script#answer_40489)(https://www.mathworks.com/matlabcentral/answers/348746-is-it-better-to-have-one-function-or-multiple-instances-of-one-code#answer_274179), but above you point out some limitations. Could you expand a bit?
Also, I find that when I put code into functions, I end up needing to pass a lot of variables in and out to accomplish what I need to (here's an example of one of my recent function calls: [Imgs, Imgs_RGB, Np, xsize_ds, zsize_ds] = procImgs(Imi, iZd, disp_depth, disp_crange, noise_disp, colorm)). Is this common, or am I doing something silly? All of these variables are in the base workspace--is there an easy way to tell a function to use that workspace instead of its own?
Jan
Jan on 10 Sep 2021
Edited: Jan on 10 Sep 2021
It is a good strategy to post a new problem as a new question, not as comment to an answer of another question. Here your question will get less attention.
If a piece of code works reliably and is useful, it will be reused to solve other problems also. If you split the job of a program into seperate parts, you can develop and test them one by one. You know this behaviour from all available Matlab functions, which are well tested and documented.
A pile of scripts cannot be tested exhaustively. If the code grows, you will loose the overview, which command is responsible for the last change of a variable. Imagine these scripts:
% scriptA.m
a = 10 * i
b(1:3) = 17
If you run this, the imaginary value 10i is and the vector [17, 17, 17] are created. Now you might decide to let this script run before:
% scriptB.m
for i = 1:5
b(i) = i;
end
Now scriptA creates the values 50 and [17, 17, 17, 4, 5].
As soon as your program gets larger than 1000 lines of code, it is impossible to keep the overview over such problems. A pile of scripts with 10'000 lines of code cannot be maintained any longer, because it is impossible to control side-effects of modifications.
On the other hand, if each functional part of the code is written as an own function, you can test it and then forget, what happens inside. If you use new variables or modify the algorithm does not matter, as long as the output matchs exactly what is advertised in the documentation. If you call Matlab's sin() function, you do not spend a second with thinking, how it is implemented, but you can concentrate on your own problem.
Sharing workspaces is equivalent to using global variables. When a frequently used global variable has an unexpected value, it is extremely hard to find out, which piece of code made the last changes. With functions and well defined inputs and outputs, this is very easy.
[Imgs, Imgs_RGB, Np, xsize_ds, zsize_ds] = ...
procImgs(Imi, iZd, disp_depth, disp_crange, noise_disp, colorm)
Use structs and meaningful (or ompletely abstract) names for variables. What about combining disp_depth, disp_crange, noise_disp to a struct "output" with the fields depth, crange and noise? Do xsize_ds and zsize_ds belong together? Then combine them.
The idea is to create "objects", which are comparable to the real world objects, which are processed by the code. I've written a larger tool for clinical decision making which grows over decades to some 100'000 lines of code. For passing arguments it uses just a few structs: Person, Examination, Model, Job, Sheets. Looking at the list of inputs and outputs I can directly tell directly, if a specific function is allowed to modify the data or to read it only. If any deeply nested subfunction needs access to a specific information, I find out in seconds, if it is available or through which intermediate functions I have to add the access to it. My function to create the output sheets as PDF needs access to die "Person" object to display its name, but it cannot modify the name. So I can let a co-worker modify the output function and can be 100% sure, that the data concerning the Person are not destroyed or invalidated by any code. With a pile of scripts such a security is impossible. Any line of code can influence almost any other line anywhere in the program.
Of course, if you start to write a program, you will not know, which pieces will be needed finally or which obejcts represent the processed data best. Usually programs have some dozens line at the beginning and solve a tiny piece of work. When the code is growing, new requirements will occur and therefore "refactoring" it is a usual step in programming: Restart the programming from scratch. If some important parts of the code can be re-used for the new version, you have saved a lot of programming and debugging time.
You are working with a huge code, which is organized as separate functions already: All toolboxes of Matlab are written as functions (or even better: as object oriented classes). Imagine that sin(), input(), axes() and plot() would be scripts, which pollute the base workspace with thousandes of variables. Calling them in a loop would be impossible already:
for k = 1:10
plot(k, 17);
disp(sin(k));
end
You would have to check the code of plot() and all called subroutines, if they modify the value of k. So this simple loop would take several hours for checking, if it is running as expected.
A frequently occuring problem in this forum is shadowing builtin functions:
max = 17;
max(1:10) % Error: Index out og bounds
Now imagine that anywhere is a script such a redefinition was done and you simply forgot, where and why. If all codes are encapsulated in functions, you will now exactly, where to search for the reason and it is usually easy to change the locally used name. In a pile of script, it will be hard to find out, where the function "max()" is meant and where the variable with the same name.
So the problem is the exponentially growing complexity of code. Experienced programmer can keep the overview over 10'000 lines of code. 100'000 lines are a magic limit of code, which can be maintained without an automatic tool for versions. For larger programs you need automatic test tools to control the effects of a bug fix in a specific function.
Scripts prevent you code to grow over a certain limit, because the complexity explodes. Other pitfalls are global variables, missing documentation, omitted unit-testing and a too complicated design of the obejcts and variables. Code, data and GUI should be strictly separated also.
Almost all scientists use computers for their work and many have to write specific software to solve new problems and learn programinning languages to do so. But software engineering is massively underestimated. It is the standard, that a PhD student wtite some code, which nobody can use later anymore due to a missing flexibility and documentation. Unmaintainable code can be found in large companies also.
So even if your problem could be solved with 12 scripts, do yourself the favour and move the parts to functions. Learning good programming habits will save you a lot of hours, days or even weeks for debugging.

Sign in to comment.

More Answers (1)

Yair Yakoby
Yair Yakoby on 3 Apr 2019
thanks for the answers. i'm frustrated as to why oo in matlab is so slow. i think JIT is not the only cause but also the overhead of functions and methods but it's pretty hard to determine. every timing difference can be attributed to JIT.
you might say that compering to python is an unfair comparison but it's also interpreted and sees no difference when using classes vs inline operations and it's not like JIT makes matlab inline faster than python inline.

Products


Release

R2018b

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!