I realise that the speed of arrayfun versus for-loops has been discussed before, but I remain puzzled by why the difference is as huge as it is in certain circumstances, and I wonder whether this is seen as an issue by the community or by MathWorks.
First an example - my questions are at the end. Define a simple class
classdef testclass
properties
p
end
methods
function tc = testclass
tc.p = pi;
end
end
end
a function that extracts a property value from a vector of objects of the class using a loop
function pvec = loopTest(tcvec)
pvec = double(size(tcvec));
for i = 1:length(tcvec)
pvec(i) = tcvec(i).p;
end
end
and one that does the same using arrayfun.
function pvec = arrayfunTest(tcvec)
pvec = arrayfun(@(t) t.p, tcvec);
end
and then run some timing code:
tc = testclass;
tcvec = repmat(tc, 1, 1000);
tloop = timeit(@() loopTest(tcvec));
tarrayfun = timeit(@() arrayfunTest(tcvec));
fprintf("arrayfun takes %f times as long as a loop\n", tarrayfun/tloop);
which on my Windows PC running R2024a, prints (typically)
arrayfun takes 52.565968 times as long as a loop
A factor of 50! This effect accounts for some of the remarkably long run times that were plaguing my app.
One standard explanation of the difference is that arrayfun has to do an extra function call, so let's see if that's what is causing this, by putting an extra function call into the loop:
function pvec = loopFunTest(tcvec)
pvec = double(size(tcvec));
f = @(t) t.p;
for i = 1:length(tcvec)
pvec(i) = f(tcvec(i));
end
end
and repeating the timing exercise:
tc = testclass;
tcvec = repmat(tc, 1, 1000);
tloopfun = timeit(@() loopFunTest(tcvec));
tarrayfun = timeit(@() arrayfunTest(tcvec));
fprintf("arrayfun takes %f times as long as a loop with a function\n", tarrayfun/tloopfun);
which now prints typically
arrayfun takes 6.163437 times as long as a loop with a function
So yes, the extra function call does have an effect. But there's still a factor of 6 between the loop and arrayfun - that's still huge, and enough to make arrayfun more or less useless. This a pity - more than a pity, a serious nuisance - as I like to use it, and my code is littered with examples as it was a natural go-to before I discovered this problem.
I've tried to make my examples clear enough to illustrate the core of the problem. My two questions are these:
  1. Out of curiosity, how do you make arrayfun run 6 times more slowly than a loop? I realise, of course, that ultimately it is implemented as a loop, but that only means we might not expect it to be a lot faster. But even the most naive implementation should allow it to be about as fast as the loop - so what is going on to produce the spectacular slowdown?
  2. What are the factors that interact with arrayfun to make it slow? Without doing masses of tests, how can I know which parts of my code need to be rewritten as loops? A sense of when it's slow and when it isn't would be really useful.

6 Comments

I don't have an answer to either of your questions. I just wanted to point out two things:
1. double(size(tcvec)) does not pre-allocate pvec properly (which I imagine was the intent):
tc = testclass;
tcvec = repmat(tc, 1, 1e5);
pvec = double(size(tcvec))
pvec = 1x2
1 100000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
tcvec = tc;
pvec = double(size(tcvec))
pvec = 1x2
1 1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
So pvec is 1x2, and the subsequent for loop is appending elements to it from the third iteration.
Instead, use something like pvec = zeros(size(tcvec)).
(Of course, with proper pre-allocation of pvec, I would expect the for loop function to run faster than before, further highlighting the relative slowness of arrayfun in this case.)
2. I realize you are presenting only an illustrative example, but I think it's worth pointing out that you can do pvec=[tcvec.p] to achieve the same result as the other functions, and faster than any of them.
tc = testclass;
T = table();
for N = 10.^(0:5)
tcvec = repmat(tc, 1, N);
thorzcat = timeit(@() horzcatTest(tcvec));
tloop = timeit(@() loopTest(tcvec));
tloopfun = timeit(@() loopFunTest(tcvec));
tarrayfun = timeit(@() arrayfunTest(tcvec));
T{end+1,:} = [N thorzcat tloop tloopfun tarrayfun];
end
T.Properties.VariableNames = ["N" "horzcat" "loop" "loop w/function" "arrayfun"]
T = 6x5 table
N horzcat loop loop w/function arrayfun _____ __________ __________ _______________ __________ 1 4.832e-06 8.332e-06 1.4767e-05 4.5832e-05 10 5.7046e-06 7.269e-06 4.6416e-05 0.00023461 100 1.5514e-05 2.8119e-05 0.00042633 0.0026568 1000 0.00011444 0.00020987 0.0037048 0.020697 10000 0.0014448 0.0021738 0.037797 0.20722 1e+05 0.014322 0.022383 0.36893 2.0506
function pvec = loopTest(tcvec)
% pvec = double(size(tcvec));
pvec = zeros(size(tcvec));
for i = 1:length(tcvec)
pvec(i) = tcvec(i).p;
end
end
function pvec = arrayfunTest(tcvec)
pvec = arrayfun(@(t) t.p, tcvec);
end
function pvec = loopFunTest(tcvec)
% pvec = double(size(tcvec));
pvec = zeros(size(tcvec));
f = @(t) t.p;
for i = 1:length(tcvec)
pvec(i) = f(tcvec(i));
end
end
function pvec = horzcatTest(tcvec)
pvec = [tcvec.p];
end
"But even the most naive implementation should allow it to be about as fast as the loop "
No, that is not the case.
In general (on CPU, the case might be different for GPU version of arrayfun, which you could try using if you have the Parallel Computing Toolbox), arrayfun will be slower than a for loop with preallocated output as it has the overhead of calling the function in each iteration. The for loop also has opportunities for optimizations between statements that the arrayfun (or cellfun) would not have.
As for factors affecting the speed of arrayfun - a major affect could be due to the function being used i.e. in addition to the overhead of calling the function, there is also the overhead of executing the function. One of the commonly used functions in arrayfun with considerable overhead is find.
"Without doing masses of tests, how can I know which parts of my code need to be rewritten as loops? A sense of when it's slow and when it isn't would be really useful."
That is something that comes with experience. I, myself, learnt this the longer way - https://in.mathworks.com/matlabcentral/answers/1793315-to-extract-the-last-sub-string-from-strings#comment_2343625
As you can see in the thread attached above, I had (naively) used arrayfun with find and the resulting performance is not good at all.
Sometimes there are better methods than a for loop as well - horzcat in this case as @Voss has shown above and sscanf as @Stephen23 has shown in the example linked.
And as Stephen adviced, I have followed this forum regularly and I continue to learn from it, even from my own mistakes. It will take some practice. I would suggest to you follow the advice Stephen has given as well.
Hope this helps.
Many thanks @Voss and @Dyuman Joshi.
Voss - yes, I slipped up and should indeed have written zeros not double. I'm cross with myself for the error, but as you say, it makes little difference to the timings. And also yes, for this example horzcat applied to a comma-separated list is the approach to take.
Dyuman - good point that I'm only thinking here about CPUs and not GPUs. But I don't really understand your statement that a naive implementation of arrayfun can't be as fast as a loop - simply because that implementation would be just to execute the exact same loop inside arrayfun. There'll only be one function call extra, which you won't notice. The overhead factors you mention surely apply equally to the loop and arrayfun, with the exception of the extra necessary function call on each iteration with arrayfun, which I allowed for in my timing tests.
Thank you both again.
Out of curiosity, how do you make arrayfun run 6 times more slowly than a loop? I realise, of course, that ultimately it is implemented as a loop, but that only means we might not expect it to be a lot faster.
That description, "ultimately it is implemented as a loop", is doing what I'd call some heavy lifting. Consider the handling of error cases. This first example works.
a = arrayfun(@(x) zeros(x), [1 1 1])
a = 1x3
0 0 0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
This next one doesn't. But it gives an informative error message when it fails, highlighting exactly which element of the array over which you're iterating caused the error. If you tried to do this in a "naive" for loop, the message wouldn't be the same as for arrayfun and (without some effort on the part of the author of the loop, which would slow down the loop a little) it wouldn't highlight which iteration failed in the error message.
a = arrayfun(@(x) zeros(x), [1 1 2])
Error using arrayfun
Non-scalar in Uniform output, at index 3, output 1.
Set 'UniformOutput' to false.
That checking of the output of your function likely isn't expensive in an absolute sense, but it isn't free either. And if the operation you're performing is fast, that validation of the function output could be a relatively expensive part of the arrayfun call.
So arrayfun isn't "just a for loop." A naive implementation might be, but the MATLAB implementation isn't that naive.
Thank you @Steven Lord. I hadn't thought about diagnostics, and yes, the arrayfun error message is much more helpful than what my own code provides. (In fact, what my implementation says in your example is just "Unable to perform assignment because the left and right sides have a different number of elements." I'd have to put in some work to provide the detail, and there would be some effect on performance.)
However, it's a trade-off, and I'd rather have the faster execution than the more informative message in this case - but I agree it's by no means obvious that will always be the right choice.
As you also point out, the amount of time taken by the function affects the significance of the loop overhead. My very minimal function perhaps unfairly emphasised the cost of the overhead, but I address this to some extent below.
Having talked about a simple loop-based reimplementation, I thought I ought to put my money where my mouth is, so my version of arrayfun, called applyFunction, is attached. It doesn't have all the functionality of arrayfun (for example it doesn't provide for more than one result) though I think most of it could be added without significant overhead.
This makes it easier to do speed comparisons. Also attached is a modified version of my test class which has a method that provides a more substantial computational load for each function call, so it's a little more realistic as I'm not timing pure overhead. Here's an example, with a bit more computation going on than before:
tc = testclass;
tcvec = repmat(tc, 1, 1000);
nrep = 1000; % controls amount of work each function call does
t1 = timeit(@() arrayfun(@(t) t.slowdown(nrep), tcvec));
t2 = timeit(@() applyFunction(@(t) t.slowdown(nrep), tcvec));
fprintf("arrayfun takes %f times as long as applyFunction\n", t1/t2);
which for me typically gives a factor of 3 in speed, for example:
arrayfun takes 2.958038 times as long as applyFunction
As you'd expect, the time ratio falls towards 1 if nrep is increased. On the other hand it climbs to 8 or 9 as it's decreased. But that case matters: I hit the problem precisely because my application was doing a simple thing to a large array of objects.
Bottom line: I still think it's worrying that my simple applyFunction (entirely in user-level MATLAB code) can go so much faster than MATLAB's arrayfunction when iterating over an object array.
"simply because that implementation would be just to execute the exact same loop inside arrayfun."
"The overhead factors you mention surely apply equally to the loop and arrayfun, "
That is not is what being done in the for loop though.
@Voss has provided an example of running a loop with a function handle and calling it in every iteration, and as you can see it is slower than the simple for loop.
"... , which you won't notice."
That is not true. There is always some overhead to calling a function, which as I mentioned depends on the function being called.

Sign in to comment.

 Accepted Answer

Joss Knight
Joss Knight on 14 Apr 2024

I wish it were a cleverer answer, but I'm afraid that it's simply that MATLAB has been heavily optimized for for loops over the years but the same optimizations have not been applied to arrayfun. In this case by far the most important optimization will be multithreading, so perhaps you have 32 virtual cores to work with.

It's tempting to just convert the arrayfun implementation to a for loop internally but as other have implied, the devil is in the detail since it would take some effort to get identical, backwards compatible behaviour.

Best just to think of arrayfun as syntactic sugar only to be used for performance non-critical situations. Or for the GPU, where it has a special implementation.

6 Comments

I wonder if there are cases where code performance is not a critical factor (of course barring initial learning about programming and syntaxes of functions and all, of course).
There are many. For instance, I don't know, when constructing a singleton object you want to iterate over its arguments in a neat concise way that makes your code easier to read like
arrayfun(@mustBeTextScalar, varargin(1:2:end));
arrayfun(@mustBeNumeric, varargin(2:2:end));
Sometimes inefficiency is genuinely less important than readability. But admittedly, you need to be cautious.
"Sometimes inefficiency is genuinely less important than readability."
I guess I'll understand this when I face it myself ¯\_(ツ)_/¯

Joss Knight: thanks for the clear answer. However, it's a pity that Matlab does not optimize arrayfun on the cpu. The advantage of arrayfun compared to loops is that arrayfun tells Matlab that the order of execution does not matter. If you write a for loop, instead, Matlab does not know if the iterations are independent. So in principle Matlab could parallelize arrayfun. On the gpu, arrayfun is very fast.

I'm not sure that's a factor. MathWorks' for loop optimizations are amazingly good at detecting when loops are independent - determining loop dependency is actually something that happens quite early in the language parsing.
There are absolutely things that you could do with arrayfun on CPU to exploit vector instructions and other hardware optimizations, but probably no way to beat what 'for' already does without damaging backwards compatibility. There's certainly no equivalent, though, for CPU that we can do on GPU, because we are taking your function and compiling it using a completely different compiler explicitly targeting the CUDA hardware. We can get away with this because we did it from the outset, and don't have to support any of the other behaviours that CPU arrayfun does (like loop dependency, performing vector operations, writing to up-level variables and million other things).

Joss: thanks for your answer, totally makes sense

Sign in to comment.

More Answers (0)

Categories

Products

Release

R2024a

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!