How to multiply a vector with each column of a matrix most efficiently?
Show older comments
Hey there,
I have a vector t (nx1) and a matrix A (nxm). I need to multiply t with each column of A element-wise.
Example: t=[ 1; 2 ]; A=[ 1 2; 3 4 ] => Res=[ 1*1 1*2; 2*3 2*4 ]
So far I have tried 4 versions:
% 1. Looping
for j = 1 : m
res( j ) = t .* A( :, j );
end
% 2. Repmat
tmat = repmat( t, 1, m )
res = tmat .* A;
% 3. diag
res = diag( t ) * A;
% 4. Indexing
A = A( : );
t = t( :, ones( 1, m ) );
t = t( : );
res = A .* t;
res = reshape( res, n, m );
I did some testing on these and found #1 to be (by far) the fastest one, followed by #2 and #4 (about 2 to 3 times slower) and then followed by #3 (8 to 9 times slower). This kind of surprised me, since #1 is the only version not taking advantage of matrix operations in some way.
Does anyone know a faster way to do this? Or could anybody give some reasons for the result I found?
3 Comments
Andrew Newell
on 8 May 2011
What platform and version of MATLAB are you using?
Eike
on 9 May 2011
Mehrdad Isfandbod
on 30 Mar 2018
I am sorry, but you haven't defined "m" on the first looping have you ?
Accepted Answer
More Answers (5)
David
on 21 Jun 2017
I just encountered this problem myself, and found all these answers to be very helpful. However, after poking a little bit more, I discovered that MATLAB 2016b and later allow implicit multiplication to do exactly the same thing as bsxfun, slightly faster.
res = A .* t or res = t .* A are equivalent and solve the problem, as long as t is a vector with the same number of rows OR columns as A.
The other answers were of course correct at the time the question was asked. This page is still being viewed ~1000 times/month, so I wanted to get the current best answer out there.
clearvars;
A = rand(1e4,2e4);
[Nr,Nc] = size(A);
v = rand(Nr,1);
tic;
% answer = zeros(size(A));
for i=Nc:-1:1
answer(:,i) = A(:,i).*v;
end
toc
tic;
answer = A.*(v*ones(1,Nc));
toc
tic;
answer = A.*(repmat(v,1,Nc));
toc
tic;
answer = bsxfun(@times,v,A);
toc
tic;
answer = A.*v;
toc
tic;
answer = v.*A;
toc
Yields the following times
Elapsed time is 0.915789 seconds.
Elapsed time is 0.863713 seconds.
Elapsed time is 0.911458 seconds.
Elapsed time is 0.685757 seconds.
Elapsed time is 0.302117 seconds.
Elapsed time is 0.285025 seconds.
For the last two, I was checking if order matters. It does not; whichever is first is always slightly faster, but moving "answer = v.*A" earlier makes it the slightly slower one.
2 Comments
Jeffrey Daniels
on 7 Dec 2017
I found the same as David on a larger matrix using Ver 2016b.
A=rand(100000,1000);
[Nr,Nc] = size(A);
v = rand(Nr,1);
tic
answer = A.*v(:,ones(1,Nc));
toc
tic
answer = bsxfun(@times,A,v);
toc
tic
answer = v.*A;
toc
Elapsed time is 2.679808 seconds.
Elapsed time is 2.653772 seconds.
Elapsed time is 1.280892 seconds.
Jonathan
on 26 Feb 2019
"but moving "answer = v.*A" earlier makes it the slightly slower one" - in MATLAB if you run the same computation multiple times it improves the speed on subsequent runs, up to a limit. As v.*A and A.*v are essentially the same then the one run second should be faster.
Matt Fig
on 9 May 2011
I find the looping option fastest on my Win Vista 32 bit. Using r2007b.
>> more_loop_timings
Elapsed time is 0.017479 seconds.
Elapsed time is 0.037864 seconds.
Elapsed time is 0.376887 seconds.
Elapsed time is 0.030641 seconds.
Elapsed time is 0.022277 seconds.
%
%
%
%
function [] = more_loop_timings
% Yet more of these....
t = (1:1000)';
A = rand(length(t),1000); % Large and square
[n,m] = size(A);
%%1. Looping
tic
for jj = m:-1:1
res(:,jj) = t.*A(:,jj); % dynamically pre-allocate
end
toc
%%2. Repmat
tic
res2 = repmat(t,1,m) .* A;
toc
%%3. diag
tic
res3 = diag(t)*A; % Yikes!
toc
% The obvious choice...
tic
res4 = bsxfun(@times,A,t);
toc
%%4. Indexing
tic
t = t(:,ones( 1, m,'single'));
res5 = reshape(A(:).*t(:),n,m);
toc
% isequal(res,res2,res3,res4,res5)
4 Comments
Jan
on 9 May 2011
Your pre-allocation method is much more efficient than ZEROS and the FOR i=1:n! Your method is faster than BSXFUN on my Matlab 2009a also. See edited post.
Andrew Newell
on 9 May 2011
That pre-allocation is sneaky!
James Tursa
on 9 May 2011
Another fast alternative to the ZEROS function for pre-allocating is my UNINIT function from the FEX:
http://www.mathworks.com/matlabcentral/fileexchange/31362-uninit-create-an-uninitialized-variable-like-zeros-but-faster
Eike
on 9 May 2011
Jan
on 9 May 2011
Simplification:
% 4. Indexing
res = t(:, ones(1, m)) .* A;
This is exactly the same effect as performed inside REPMAT. On my Matlab 2009a on a single core BSXFUN is reproducibly the fastest. It is very near to a naively implemented C-Mex version (single threaded, no SSE).
EDITED: Matt's pre-allocation is fast. Matlab 2009a, WinXP 32, 1.5GHz Pentium-M (single core)
t = (1:1000)';
A = rand(1000, 200);
tic; for rep = 1:100; R = func1(t, A); end; toc
function R = func1(t, A) % 0.78 sec
[n, m] = size(A);
R = zeros(n, m);
for i = 1:m
R(:, i) = t .* A(:, i);
end
function R = func2(t, A) % 0.54 sec
[n, m] = size(A);
for i = m:-1:1
R(:, i) = t .* A(:, i);
end
function R = func3(t, A) % 0.54 sec
R = bsxfun(@times, t, A);
function A = func4(t, A) % 0.75 sec
[n, m] = size(A);
for i = 1:m
A(:, i) = t .* A(:, i);
end
7 Comments
James Tursa
on 9 May 2011
Another faster alternative to pre-allocating with the ZEROS function is to use my new UNINIT function from the FEX:
http://www.mathworks.com/matlabcentral/fileexchange/31362-uninit-create-an-uninitialized-variable-like-zeros-but-faster
Jan
on 9 May 2011
I'm very proud to be the first one who downloaded it. All my experiments with mxCreateUninitNumericArray failed. Therefore I'm looking forward to squeeze some more speed out of Matlab.
Does anybody knows something about mxFastZeros?
Andrew Newell
on 9 May 2011
I'm the first too! And the page still says zero downloads, so other people can also be first!
Jan
on 9 May 2011
@Andrew: Then James' function is really powerful: Even the download counter remains uninitialized!
James Tursa
on 9 May 2011
(groan ...) Jan, can you clarify? Did my UNINIT routine fail (which does *not* use mxCreateUninitNumericArray since it is only available from R2008b on), or did your own code fail when you tried to use mxCreateUninitNumericArray? If the former, not sure what the problem is since it worked for me on 32-bit R2006b through R2011a PC WinXP. Unfortunately, I don't have other systems to test with. I haven't tried to figure out how mxFastZeros works yet.
James Tursa
on 9 May 2011
@Jan: The mxFastZeros function signature appears to be this based on some quick testing:
mxArray *mxFastZeros(mxComplexity ComplexFlag, mwSize m, mwSize n);
It returns a double matrix presumably filled with zeros. And it is indeed fast (as fast as my UNINIT function). It does have a drawback in that the returned mxArray is a NORMAL array and not a TEMPORARY array, so you can't return it as-is back to MATLAB without first hacking into it and changing the VariableType field to 4 (TEMPORARY). I got strange results unless I changed the VariableType first. Also, it appears to only return a double matrix. I tried to add a mxClassID argument at the end but it didn't change anything. mxFastZeros appears in all of the MATLAB versions that I have (R2006b through R2011a PC WinXP 32-bit). Don't know about other versions.
Jan
on 10 May 2011
@James: Clarification: Your UNINIT function works fine. I played with mxCreateUninitNumericMatrix before, but without success. Most likely I kicked off some bits from the memory by using bad pointers.
Andrew Newell
on 8 May 2011
I don't find that. If my input is
clear
t=(1:1000)'; A=rand(length(t),200);
[n,m] = size(A);
%%1. Looping
tic
for j = 1 : m
res(:,j) = t .* A( :, j ); % You had an error in this line
end
toc
%%2. Repmat
tic
tmat = repmat( t, 1, m );
res = tmat .* A;
toc
%%3. diag
tic
res = diag( t ) * A;
toc
%%4. Indexing
tic
A = A( : );
t = t( :, ones( 1, m ) );
t = t( : );
res = A .* t;
res = reshape( res, n, m );
toc
The output is
Elapsed time is 0.300359 seconds.
Elapsed time is 0.003952 seconds.
Elapsed time is 0.035276 seconds.
Elapsed time is 0.004409 seconds.
1 Comment
Andrei Bobrov
on 8 May 2011
my research and the result
clear
a = zeros(5,100);
for jj = 1:100
t=rand(1000,1); A=rand(length(t),200);
[n,m] = size(A);
% 1. Looping
tic
for j = 1 : m
res(:,j) = t .* A( :, j );
end
a(1,jj)=toc;
% 2. Repmat
tic
tmat = repmat( t, 1, m );
res = tmat .* A;
a(2,jj)=toc;
% 3. diag
tic
res = diag( t ) * A;
a(3,jj)=toc;
% 4. Indexing
tic
A1 = A( : );
t1 = t( :, ones( 1, m ) );
t1 = t1( : );
res = A1 .* t1;
res = reshape( res, n, m );
a(4,jj)=toc;
% 5. bsxfun
tic
res = bsxfun(@times,t,A );
a(5,jj)=toc;
end
AA = mean(a,2)
AA =
0.0006
0.0023
0.0413
0.0031
0.0015
Anders Munk-Nielsen
on 23 Mar 2012
0 votes
I'm having similar problems in that I have an NxM matrix, where N is extremely large (5 million) and M is fairly small (below 100) and I have a 1xM vector that I want to multiply onto all N rows. I'm trying to improve on an implementation that uses repmat which is incredibly slow and in my intuition this is due to the huge memory footprint.
My question: Why isn't bsxfun any faster than it is? I would expect it to be waaay faster but it's only somewhat faster. If I were to implement a mex function to do this (a multiplyVectorToMatrix function), what is the crucial feature to get built in? Multithreaded'ness?
3 Comments
Anders Munk-Nielsen
on 23 Mar 2012
Btw, here is some code to run what I have in mind (By the way, M is K below):
clc; clear all;
N = 300000;
S = 1000;
K = 10;
x = rand(N,K);
B = rand(K,1);
eps = randn(N,1);
Y = x*B + eps;
Bhat = zeros(K,S);
C = randn(N,S);
%% repmat
tic;
px = (x'*x)\x';
Bhat = px*(repmat(Y,1,S)-C);
fprintf('repmat took %1.3g seconds.\n',toc);
%% for loop
tic;
px = (x'*x)\x';
for s=1:S;
%YmC = Y-C(:,s);
Bhat(:,s) = px*(Y-C(:,s));
end;
fprintf('for took %1.3g seconds.\n',toc);
%% parfor loop
tic;
px = (x'*x)\x';
parfor s=1:S;
Bhat(:,s) = px*(Y-C(:,s));
end;
fprintf('Parfor took %1.3g seconds.\n',toc);
%% bsxfun
tic;
px = (x'*x)\x';
Bhat = px*bsxfun(@minus,Y,C);
fprintf('Bsxfun took %1.3g seconds.\n',toc);
Sean de Wolski
on 23 Mar 2012
bsxfun(@times,yourNxM,yourMx1)
doc bsxfun %for more info.
Sean de Wolski
on 23 Mar 2012
Your setup for timign this isn't really fair. Look at Matt's framework for a more fair comparison.
Categories
Find more on MATLAB in Help Center and File Exchange
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!