Line of Best Fit through Scattered Data
3 views (last 30 days)
Show older comments
I need to find the line of best fit through my scatterplot. I have attached my text file, and my code is the following.
clear all
fid = fopen( 'oo20.txt');
data = textscan(fid, '%f%f', 'Delimiter', '|', 'TreatAsEmpty','~');
fclose(fid);
GalList.year = data{1};
D = data{2};
X1 = GalList.year;
Y1 = D;
scatter(X1,Y1);
ylim([0 20])
3 Comments
Star Strider
on 11 Oct 2015
I just peeked at it and it seems reasonable to delete this one:
~|3.4028886
Does it have any specific significance?
Accepted Answer
Star Strider
on 11 Oct 2015
I don’t see anything strange about the data, but the regression is failing. The data are both column vectors. With the full set of data, the parameters I estimate are both zero for the biparametric regression, and for the uniparametric (origin intercept) regression, the single parameter is zero. Trying it with polyfit results in both parameters being NaN, so it’s not my code. (I deleted the polyfit calls in the posted code.) It’s not obvious to me what problems there may be, but with three attempts failing, something is very wrong somewhere.
Of interest, everything works fine with a random sample of 280 data pairs, giving an intercept of 31 BCE (so that’s when astronomy began!), and a slope of +0.022 (is that Galaxies Discovered/Year?). Any more than 280 breaks the code for some reason.
I don’t believe the duplicated years should cause problems, since linear regression is usually robust to such. If you have any insights as to what the problem may be with your full data set, please share them. You know them better than I do, and what they should look like.
I plotted a linear fit tonight. Any others that might be more descriptive of whatever you’re observing that you’d like to try?
This is the end of my day, so I’ll come back to this in the morning.
My code:
fidi = fopen('jgillis16 oo20.txt', 'rt');
D = textscan(fidi, '%f|%f', 'CollectOutput',1, 'TreatAsEmpty','~');
X1 = D{:}(:,1);
Y1 = D{:}(:,2);
RandRows = randi(length(X1), 280, 1);
X1 = X1(RandRows); % Hypothesis: Works With Random Subset -> Accepted
Y1 = Y1(RandRows);
DesignMtx = [ones(size(X1)) X1]; % MODEL: X1*B = Y1
B2 = DesignMtx\Y1; % Linear Biparametric Regression — Estimate Parameters
Yhat2 = DesignMtx*B2; % Linear Biparametric Regression — Generate Line
B1 = X1\Y1; % Linear Uniparametric Regression — Estimate Parameter
Yhat1 = X1*B1; % Linear Uniparametric Regression — Generate Line
XTX = (DesignMtx'*DesignMtx); % X'X
figure(1)
scatter(X1, Y1, 'bp')
hold on
plot(X1, Yhat2, '-r')
hold off
grid
xlabel('Year')
ylabel('Distance (Parsecs)')
3 Comments
Star Strider
on 12 Oct 2015
My pleasure!
Extrapolating back to the x-intercept, the first galaxy discovered was in January 1409. It was undoubtedly the Milky Way, because it has a distance of zero (we’re in it).
Having fun with the numbers...
More Answers (2)
Image Analyst
on 11 Oct 2015
There are other more sophisticated methods, but try polyfit() and polyval(). See attached demo.
0 Comments
See Also
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!