vectorized code is more time consuming than a simple for loop

Question

hosein bashi on 18 Sep 2022

0
Link

Direct link to this question

https://nl.mathworks.com/matlabcentral/answers/1807375-vectorized-code-is-more-time-consuming-than-a-simple-for-loop

Edited: John D'Errico on 18 Sep 2022

Hi.

There is a simple "for loop" and I changed the loop to a vectorized computation hope to get a faster function but it's slower! any idea?

CODE:

p=    8.7390;
T=  791.6200;
a=tic;
for i=1:100000
    h2_pT(p, T);
end
toc(a)
b=tic;
for i=1:100000
    h2_pTnew(p, T);
end
toc(b)
function h2_pT = h2_pT(p, T)
Ir = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 6, 6, 6, 7, 7, 7, 8, 8, 9, 10, 10, 10, 16, 16, 18, 20, 20, 20, 21, 22, 23, 24, 24, 24];
Jr = [0, 1, 2, 3, 6, 1, 2, 4, 7, 36, 0, 1, 3, 6, 35, 1, 2, 3, 7, 3, 16, 35, 0, 11, 25, 8, 36, 13, 4, 10, 14, 29, 50, 57, 20, 35, 48, 21, 53, 39, 26, 40, 58];
nr = [-1.7731742473213E-03, -0.017834862292358, -0.045996013696365, -0.057581259083432, -0.05032527872793, -3.3032641670203E-05, -1.8948987516315E-04, -3.9392777243355E-03, -0.043797295650573, -2.6674547914087E-05, 2.0481737692309E-08, 4.3870667284435E-07, -3.227767723857E-05, -1.5033924542148E-03, -0.040668253562649, -7.8847309559367E-10, 1.2790717852285E-08, 4.8225372718507E-07, 2.2922076337661E-06, -1.6714766451061E-11, -2.1171472321355E-03, -23.895741934104, -5.905956432427E-18, -1.2621808899101E-06, -0.038946842435739, 1.1256211360459E-11, -8.2311340897998, 1.9809712802088E-08, 1.0406965210174E-19, -1.0234747095929E-13, -1.0018179379511E-09, -8.0882908646985E-11, 0.10693031879409, -0.33662250574171, 8.9185845355421E-25, 3.0629316876232E-13, -4.2002467698208E-06, -5.9056029685639E-26, 3.7826947613457E-06, -1.2768608934681E-15, 7.3087610595061E-29, 5.5414715350778E-17, -9.436970724121E-07];
R = 0.461526; %kJ/(kg K)
Pi = p;
tau = 540 / T;
gr_tau = 0;
for i = 1 : 43
    gr_tau = gr_tau + nr(i) * Pi ^ Ir(i) * Jr(i) * (tau - 0.5) ^ (Jr(i) - 1);
end
h2_pT = R * T * tau * (gr_tau);
end
function h2_pT = h2_pTnew(p, T)
Ir = [1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 4, 4, 4, 5, 6, 6, 6, 7, 7, 7, 8, 8, 9, 10, 10, 10, 16, 16, 18, 20, 20, 20, 21, 22, 23, 24, 24, 24];
Jr = [0, 1, 2, 3, 6, 1, 2, 4, 7, 36, 0, 1, 3, 6, 35, 1, 2, 3, 7, 3, 16, 35, 0, 11, 25, 8, 36, 13, 4, 10, 14, 29, 50, 57, 20, 35, 48, 21, 53, 39, 26, 40, 58];
nr = [-1.7731742473213E-03, -0.017834862292358, -0.045996013696365, -0.057581259083432, -0.05032527872793, -3.3032641670203E-05, -1.8948987516315E-04, -3.9392777243355E-03, -0.043797295650573, -2.6674547914087E-05, 2.0481737692309E-08, 4.3870667284435E-07, -3.227767723857E-05, -1.5033924542148E-03, -0.040668253562649, -7.8847309559367E-10, 1.2790717852285E-08, 4.8225372718507E-07, 2.2922076337661E-06, -1.6714766451061E-11, -2.1171472321355E-03, -23.895741934104, -5.905956432427E-18, -1.2621808899101E-06, -0.038946842435739, 1.1256211360459E-11, -8.2311340897998, 1.9809712802088E-08, 1.0406965210174E-19, -1.0234747095929E-13, -1.0018179379511E-09, -8.0882908646985E-11, 0.10693031879409, -0.33662250574171, 8.9185845355421E-25, 3.0629316876232E-13, -4.2002467698208E-06, -5.9056029685639E-26, 3.7826947613457E-06, -1.2768608934681E-15, 7.3087610595061E-29, 5.5414715350778E-17, -9.436970724121E-07];
R = 0.461526; %kJ/(kg K)
Pi = p;
tau = 540 / T;
gr_tau = sum( + nr.* Pi .^ Ir.* Jr .* (tau - 0.5) .^ (Jr - 1));
h2_pT = R * T * tau * ( gr_tau);
end

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Sign in to answer this question.

Answer 1

John D'Errico on 18 Sep 2022

3
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1807375-vectorized-code-is-more-time-consuming-than-a-simple-for-loop#answer_1055805

Edited: John D'Errico on 18 Sep 2022

Open in MATLAB Online

It is not uncommon to have vectorized code be slower than a brute force loop. Why?

The word "vectorization" can refer to many different coding paradigms. In a basic example, find the sum of the natural numbers 1 through 100.

n = 1:100;
sum(n)
ans =
        5050
N = 100;
S = 0;
for ind = 1:N
    S = S + ind;
end
S
S =
        5050

Of course, both schemes produce the same end result. One of them generates the set of numbers internally, putting them into memory. Then an internal loop is generated to form the sum. I call that an implicit loop, since you never see the loop itself, but it is there. In the latter case, we have an explicit or external loop.

Will the vectorized version be faster? Well, usually yes. But supose N was 1e9? Or some number large enough, that just to populate that large a fragment of internal memory with these numbers is itself highly CPU intensive. You may even force MATLAB to use virtual memeory, swapping stuff in and out. So there is a point where the explicit loop may be better. (I recently recall an example running of exactly that case, in some recent response I wrote, but I write a LOT of responses...)

Of course, there are many other variations of codes, vectorized and unvectorized, that will gain or not, depending on the problem size. Sometimes it is the dimensionality of the problem that costs you.

For example, suppose I wanted to find all sets of 6 integers that sum to 50? I could write it like this:

tic,
Sets = zeros(0,6);
for n1 = 1:50
  for n2 = n1:50
    for n3 = n2:50
      for n4 = n3:50
        for n5 = n4:50
          for n6 = n5:50
            if (n1 + n2 + n3+ n4+ n5+ n6) == 50
              Sets(end+1,:) = [n1, n2, n3, n4, n5, n6];
            end
          end
        end
      end
    end
  end
end
toc
Elapsed time is 0.096587 seconds.
size(Sets,1)
ans =
        5427

So we have nested for loop, 6 levels deep. UGH. That actually ran pretty quickly, though if I made the problem a little larger, my computer will still grind to a halt. But it will terminate eventually. Totally unvectorized though.

Or, I could use ndgrid to generate ALL possible combinations of the numbers 1:45. Then isolate the ways they will add up to 50. Something like this:

tic,
elements = int16(1:45);
[N1,N2,N3,N4,N5,N6] = ndgrid(elements,elements,elements,elements,elements,elements);
ind = ((N1 + N2 + N3 + N4 + N5 + N6) == 50) & (N1 <= N2) & (N2 <= N3) & (N3 <= N4) & (N4 <= N5) & (N5 <= N6);
Sets = [N1(ind),N2(ind),N3(ind),N4(ind),N5(ind),N6(ind)];
toc
Elapsed time is 175.219414 seconds.
size(Sets,1)
ans =
        5427

Each array will contain 45^6 elements, so approximately 8e9 elements, each of which is a double. (To be a little smart, I did this using int16 arrays.) Now find all combinations of those arrays, where they sum to 50. This scheme I tried on my computer (whcih has a fair amount of RAM) and almost gave up, with all 8 CPUS going full tilt, and my system fan running flat out for about 3 minutes.

Do you see the latter form is massively memory intensive, while the former is massively looped?

Another approach is to use my own partitions code, found on the FEX, where a recursive search is performed.

tic,P6 = partitions(50,1:45,6,6);toc
Elapsed time is 0.296554 seconds.
size(P6,1)
ans =
        5427

Again, there are only 5427 solutions there. It is actually slower than the brute force looped approach, but faster than the massive memory solution.

There are other variations of vectorized solutions to problems, surely more than I can count easily, because people call all sorts of things vectorizations. Sometimes, a solution that uses mathematics is a "vectorization". For example, if you want to form the sum of the numbers 1:n, you might just use the formula, known even to Gauss:

N = 100;
N*(N+1)/2
ans =
        5050

The point is, vectorized schemes vary widely. They are not always more efficient. Sometimes you may choose a vectorized scheme purely because it looks pretty. I can think of schemes to solve a problem that use tools like intlinprog or lsqlin or backslash, where the solution might then be called a vectorized solution.

What really matters is you understand what the vectorization is doing, and why it may or may not be efficient. Whether it actually gains will often be problem/parameter/size dependent. In the end, you gain when you understand the mathematics, when you understand the limitations of your computer, and when you understand how MATLAB will internally implement the code you write.

0 Comments
Show -2 older commentsHide -2 older comments

Sign in to comment.

Answer 2

Chunru on 18 Sep 2022

0
Link

Direct link to this answer

https://nl.mathworks.com/matlabcentral/answers/1807375-vectorized-code-is-more-time-consuming-than-a-simple-for-loop#answer_1055760

With the improvement of the execution engine and jit, newer verions of matlab improve the for-loop performance. Very often, the for-loops no longer slowown performance. However, the vectorized code is more aligned with the matrix/array thinking that matlab promotes and has concise expressions usually.

1 Comment
Show -1 older commentsHide -1 older comments

dpb on 18 Sep 2022

Edited: dpb on 18 Sep 2022

I'd guess the extra is in the overhead of the additional function call to sum() here since the timed function is so small.

There's also an extra allocation that may/may not effect the time much in the temporary variable Pi instead of p used in the second function.

Also, was the timing done with code at the command window or as m-files? There's limited jit optimization at command line so need the m-file versions.

Sign in to comment.

vectorized code is more time consuming than a simple for loop

0 Comments
Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments
Show -2 older commentsHide -2 older comments

More Answers (1)

1 Comment
Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

vectorized code is more time consuming than a simple for loop

0 Comments Show -2 older commentsHide -2 older comments

Accepted Answer

0 Comments Show -2 older commentsHide -2 older comments

More Answers (1)

1 Comment Show -1 older commentsHide -1 older comments

See Also

Categories

Tags

Community Treasure Hunt

0 Comments
Show -2 older commentsHide -2 older comments

0 Comments
Show -2 older commentsHide -2 older comments

1 Comment
Show -1 older commentsHide -1 older comments