Parallel-ising mldivide for fitting millions of different linear fits

3 views (last 30 days)
Hi, I need to regularly find the gradient of 3 million sets of data. Each set of data is about 30 points of X and Y, and I find the gradient of one of the points using the mldivide or backslash operator: g=X\Y. X and Y is different for each of the 3 million sets of data. X is [n,1] and Y is [n,1], where n~30.
I currently have a for loop that cycles through all 3 million sets of 30 data points and finds 3 million gradients, but surely there is a way to speed this up in Matlab using a matrix operation? I've searched a lot and gone through a bunch of examples but haven't come across a solution that I've been able to understand. Can anyone help me please?
I don't have the Matlab statistics or parallel computing toolboxes.
  5 Comments
Athrunsan
Athrunsan on 29 May 2015
The number of points can vary, but where there is less I think I can just zero-pad, which shouldn't affect the fit result when it passes through the origin anyway.
John D'Errico
John D'Errico on 29 May 2015
Zero padding would affect somethings like statistics on the result. But if you are forcing the fit through the origin, then a zero pad will not impact the estimated slope.

Sign in to comment.

Accepted Answer

John D'Errico
John D'Errico on 29 May 2015
Edited: John D'Errico on 29 May 2015
Given the comments, it turns out that the problem is to do a fit, with NO intercept, and the same number of points in each set. (Zero padding will be done for sets with fewer points, which will not impact the slope estimate.)
As a reference for the time, do it first with a loop.
n = 1000000;
xdata = randn(30,n);
ydata = randn(30,n);
tic
coef = zeros(1,n);
for ind = 1:n
coef(:,ind) = xdata(:,ind)\ydata(:,ind);
end
toc
Elapsed time is 7.643831 seconds for 1e6 sub-problems.
So to do 1000000 of these on my machine with a loop, took roughly 7.6 seconds.
A fit with no intercept is easy enough to do, even without backslash. So we assume a model of y=slope*x.
slope = sum(x.*y)./sum(x.^2)
Given the arrays xdata and ydata...
tic
slope = sum(xdata.*ydata,1)./sum(xdata.^2,1);
toc
Elapsed time is 0.316517 seconds.
So considerably faster, for this simple case.
Note that there MAY be numerical issues with this formula. If the independent variable has a large (non-zero) mean, then the above formula will be less accurate than we would expect from double precision. In this case, it is essentially a tradeoff of time for accuracy. (I can fix that problem too, at some cost in time.)

More Answers (0)

Categories

Find more on Linear Algebra in Help Center and File Exchange

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!