arrayfun vs loops again

Question

1 vote

I realise that the speed of arrayfun versus for-loops has been discussed before, but I remain puzzled by why the difference is as huge as it is in certain circumstances, and I wonder whether this is seen as an issue by the community or by MathWorks.

First an example - my questions are at the end. Define a simple class

classdef testclass
    properties
        p
    end
    methods
        function tc = testclass
            tc.p = pi;
        end
    end
end

a function that extracts a property value from a vector of objects of the class using a loop

function pvec = loopTest(tcvec)
pvec = double(size(tcvec));
for i = 1:length(tcvec)
    pvec(i) = tcvec(i).p;
end
end

and one that does the same using arrayfun.

function pvec = arrayfunTest(tcvec)
pvec = arrayfun(@(t) t.p, tcvec);
end

and then run some timing code:

tc = testclass;
tcvec = repmat(tc, 1, 1000);
tloop = timeit(@() loopTest(tcvec));
tarrayfun = timeit(@() arrayfunTest(tcvec));
fprintf("arrayfun takes %f times as long as a loop\n", tarrayfun/tloop);

which on my Windows PC running R2024a, prints (typically)

arrayfun takes 52.565968 times as long as a loop

A factor of 50! This effect accounts for some of the remarkably long run times that were plaguing my app.

One standard explanation of the difference is that arrayfun has to do an extra function call, so let's see if that's what is causing this, by putting an extra function call into the loop:

function pvec = loopFunTest(tcvec)
pvec = double(size(tcvec));
f = @(t) t.p;
for i = 1:length(tcvec)
    pvec(i) = f(tcvec(i));
end
end

and repeating the timing exercise:

tc = testclass;
tcvec = repmat(tc, 1, 1000);
tloopfun = timeit(@() loopFunTest(tcvec));
tarrayfun = timeit(@() arrayfunTest(tcvec));
fprintf("arrayfun takes %f times as long as a loop with a function\n", tarrayfun/tloopfun);

which now prints typically

arrayfun takes 6.163437 times as long as a loop with a function

So yes, the extra function call does have an effect. But there's still a factor of 6 between the loop and arrayfun - that's still huge, and enough to make arrayfun more or less useless. This a pity - more than a pity, a serious nuisance - as I like to use it, and my code is littered with examples as it was a natural go-to before I discovered this problem.

I've tried to make my examples clear enough to illustrate the core of the problem. My two questions are these:

Out of curiosity, how do you make arrayfun run 6 times more slowly than a loop? I realise, of course, that ultimately it is implemented as a loop, but that only means we might not expect it to be a lot faster. But even the most naive implementation should allow it to be about as fast as the loop - so what is going on to produce the spectacular slowdown?
What are the factors that interact with arrayfun to make it slow? Without doing masses of tests, how can I know which parts of my code need to be rewritten as loops? A sense of when it's slow and when it isn't would be really useful.

6 Comments
Show 4 older comments Hide 4 older comments

Voss on 5 Apr 2024

Edited: Voss on 6 Apr 2024

Open in MATLAB Online

testclass.m

I don't have an answer to either of your questions. I just wanted to point out two things:

1. double(size(tcvec)) does not pre-allocate pvec properly (which I imagine was the intent):

tc = testclass;
tcvec = repmat(tc, 1, 1e5);
pvec = double(size(tcvec))
pvec = 1x2
           1      100000
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>
tcvec = tc;
pvec = double(size(tcvec))
pvec = 1x2
     1     1
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

So pvec is 1x2, and the subsequent for loop is appending elements to it from the third iteration.

Instead, use something like pvec = zeros(size(tcvec)).

(Of course, with proper pre-allocation of pvec, I would expect the for loop function to run faster than before, further highlighting the relative slowness of arrayfun in this case.)

2. I realize you are presenting only an illustrative example, but I think it's worth pointing out that you can do pvec=[tcvec.p] to achieve the same result as the other functions, and faster than any of them.

tc = testclass;
T = table();
for N = 10.^(0:5)
    tcvec = repmat(tc, 1, N);
    thorzcat = timeit(@() horzcatTest(tcvec));
    tloop = timeit(@() loopTest(tcvec));
    tloopfun = timeit(@() loopFunTest(tcvec));
    tarrayfun = timeit(@() arrayfunTest(tcvec));
    T{end+1,:} = [N thorzcat tloop tloopfun tarrayfun];
end
T.Properties.VariableNames = ["N" "horzcat" "loop" "loop w/function" "arrayfun"]
T = 6x5 table
      N       horzcat         loop       loop w/function     arrayfun 
    _____    __________    __________    _______________    __________

        1     4.832e-06     8.332e-06      1.4767e-05       4.5832e-05
       10    5.7046e-06     7.269e-06      4.6416e-05       0.00023461
      100    1.5514e-05    2.8119e-05      0.00042633        0.0026568
     1000    0.00011444    0.00020987       0.0037048         0.020697
    10000     0.0014448     0.0021738        0.037797          0.20722
    1e+05      0.014322      0.022383         0.36893           2.0506
function pvec = loopTest(tcvec)
% pvec = double(size(tcvec));
pvec = zeros(size(tcvec));
for i = 1:length(tcvec)
    pvec(i) = tcvec(i).p;
end
end
function pvec = arrayfunTest(tcvec)
pvec = arrayfun(@(t) t.p, tcvec);
end
function pvec = loopFunTest(tcvec)
% pvec = double(size(tcvec));
pvec = zeros(size(tcvec));
f = @(t) t.p;
for i = 1:length(tcvec)
    pvec(i) = f(tcvec(i));
end
end
function pvec = horzcatTest(tcvec)
pvec = [tcvec.p];
end

Dyuman Joshi on 6 Apr 2024

"But even the most naive implementation should allow it to be about as fast as the loop "

No, that is not the case.

In general (on CPU, the case might be different for GPU version of arrayfun, which you could try using if you have the Parallel Computing Toolbox), arrayfun will be slower than a for loop with preallocated output as it has the overhead of calling the function in each iteration. The for loop also has opportunities for optimizations between statements that the arrayfun (or cellfun) would not have.

As for factors affecting the speed of arrayfun - a major affect could be due to the function being used i.e. in addition to the overhead of calling the function, there is also the overhead of executing the function. One of the commonly used functions in arrayfun with considerable overhead is find.

"Without doing masses of tests, how can I know which parts of my code need to be rewritten as loops? A sense of when it's slow and when it isn't would be really useful."

That is something that comes with experience. I, myself, learnt this the longer way - https://in.mathworks.com/matlabcentral/answers/1793315-to-extract-the-last-sub-string-from-strings#comment_2343625

As you can see in the thread attached above, I had (naively) used arrayfun with find and the resulting performance is not good at all.

Sometimes there are better methods than a for loop as well - horzcat in this case as @Voss has shown above and sscanf as @Stephen23 has shown in the example linked.

And as Stephen adviced, I have followed this forum regularly and I continue to learn from it, even from my own mistakes. It will take some practice. I would suggest to you follow the advice Stephen has given as well.

Hope this helps.

Steven Lord on 6 Apr 2024

Open in MATLAB Online

Out of curiosity, how do you make arrayfun run 6 times more slowly than a loop? I realise, of course, that ultimately it is implemented as a loop, but that only means we might not expect it to be a lot faster.

That description, "ultimately it is implemented as a loop", is doing what I'd call some heavy lifting. Consider the handling of error cases. This first example works.

a = arrayfun(@(x) zeros(x), [1 1 1])
a = 1x3
     0     0     0
<mw-icon class=""></mw-icon>
<mw-icon class=""></mw-icon>

This next one doesn't. But it gives an informative error message when it fails, highlighting exactly which element of the array over which you're iterating caused the error. If you tried to do this in a "naive" for loop, the message wouldn't be the same as for arrayfun and (without some effort on the part of the author of the loop, which would slow down the loop a little) it wouldn't highlight which iteration failed in the error message.

a = arrayfun(@(x) zeros(x), [1 1 2])
Error using arrayfun
Non-scalar in Uniform output, at index 3, output 1.
Set 'UniformOutput' to false.

That checking of the output of your function likely isn't expensive in an absolute sense, but it isn't free either. And if the operation you're performing is fast, that validation of the function output could be a relatively expensive part of the arrayfun call.

So arrayfun isn't "just a for loop." A naive implementation might be, but the MATLAB implementation isn't that naive.

David Young on 8 Apr 2024

Open in MATLAB Online

Thank you @Steven Lord. I hadn't thought about diagnostics, and yes, the arrayfun error message is much more helpful than what my own code provides. (In fact, what my implementation says in your example is just "Unable to perform assignment because the left and right sides have a different number of elements." I'd have to put in some work to provide the detail, and there would be some effect on performance.)

However, it's a trade-off, and I'd rather have the faster execution than the more informative message in this case - but I agree it's by no means obvious that will always be the right choice.

As you also point out, the amount of time taken by the function affects the significance of the loop overhead. My very minimal function perhaps unfairly emphasised the cost of the overhead, but I address this to some extent below.

Having talked about a simple loop-based reimplementation, I thought I ought to put my money where my mouth is, so my version of arrayfun, called applyFunction, is attached. It doesn't have all the functionality of arrayfun (for example it doesn't provide for more than one result) though I think most of it could be added without significant overhead.

This makes it easier to do speed comparisons. Also attached is a modified version of my test class which has a method that provides a more substantial computational load for each function call, so it's a little more realistic as I'm not timing pure overhead. Here's an example, with a bit more computation going on than before:

tc = testclass;
tcvec = repmat(tc, 1, 1000);
nrep = 1000;    % controls amount of work each function call does
t1 = timeit(@() arrayfun(@(t) t.slowdown(nrep), tcvec));
t2 = timeit(@() applyFunction(@(t) t.slowdown(nrep), tcvec));
fprintf("arrayfun takes %f times as long as applyFunction\n", t1/t2);

which for me typically gives a factor of 3 in speed, for example:

arrayfun takes 2.958038 times as long as applyFunction

As you'd expect, the time ratio falls towards 1 if nrep is increased. On the other hand it climbs to 8 or 9 as it's decreased. But that case matters: I hit the problem precisely because my application was doing a simple thing to a large array of objects.

Bottom line: I still think it's worrying that my simple applyFunction (entirely in user-level MATLAB code) can go so much faster than MATLAB's arrayfunction when iterating over an object array.

Dyuman Joshi on 16 Apr 2024

"simply because that implementation would be just to execute the exact same loop inside arrayfun."

"The overhead factors you mention surely apply equally to the loop and arrayfun, "

That is not is what being done in the for loop though.

@Voss has provided an example of running a loop with a function handle and calling it in every iteration, and as you can see it is slower than the simple for loop.

"... , which you won't notice."

That is not true. There is always some overhead to calling a function, which as I mentioned depends on the function being called.

Also, see this - https://stackoverflow.com/questions/1693429/is-matlab-oop-slow-or-am-i-doing-something-wrong#1745686

https://blogs.mathworks.com/loren/2012/03/26/considering-performance-in-object-oriented-matlab-code/

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Joss Knight on 14 Apr 2024

3 votes

I wish it were a cleverer answer, but I'm afraid that it's simply that MATLAB has been heavily optimized for for loops over the years but the same optimizations have not been applied to arrayfun. In this case by far the most important optimization will be multithreading, so perhaps you have 32 virtual cores to work with.

It's tempting to just convert the arrayfun implementation to a for loop internally but as other have implied, the devil is in the detail since it would take some effort to get identical, backwards compatible behaviour.

Best just to think of arrayfun as syntactic sugar only to be used for performance non-critical situations. Or for the GPU, where it has a special implementation.

6 Comments
Show 4 older comments Hide 4 older comments

Joss Knight on 4 Dec 2025

I'm not sure that's a factor. MathWorks' for loop optimizations are amazingly good at detecting when loops are independent - determining loop dependency is actually something that happens quite early in the language parsing.

There are absolutely things that you could do with arrayfun on CPU to exploit vector instructions and other hardware optimizations, but probably no way to beat what 'for' already does without damaging backwards compatibility. There's certainly no equivalent, though, for CPU that we can do on GPU, because we are taking your function and compiling it using a completely different compiler explicitly targeting the CUDA hardware. We can get away with this because we did it from the outset, and don't have to support any of the other behaviours that CPU arrayfun does (like loop dependency, performing vector operations, writing to up-level variables and million other things).

Alessandro on 5 Dec 2025

Joss: thanks for your answer, totally makes sense

Sign in to comment.

arrayfun vs loops again

6 Comments
Show 4 older comments Hide 4 older comments

Accepted Answer

6 Comments
Show 4 older comments Hide 4 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

arrayfun vs loops again

6 Comments Show 4 older comments Hide 4 older comments

Accepted Answer

6 Comments Show 4 older comments Hide 4 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

6 Comments
Show 4 older comments Hide 4 older comments

6 Comments
Show 4 older comments Hide 4 older comments