Polynomial 2nd degree

25 views (last 30 days)
moeJ
moeJ on 8 Apr 2019
Edited: moeJ on 3 May 2019
I tried using the curve fitting tool, however i had an error saying that 'Data sizes are incompatible.'
dataset= xlsread('ML.xlsx','sheet2')
a=dataset(:,1)
S=dataset(:,3:9)
D= repelem(a(1:end, :), 1, 7)
cftool
  4 Comments
dpb
dpb on 12 Apr 2019
I've no idea what you think
SalMax=repelem(Salinity(1:end, :), 1, 7)
is for or doing but to fit a higher order polynomial with polyfit you just set the desired order; it does everything else automagically.
Attach the data set; the first step in any fitting problem is to visualize the data...we can't do anything with only the response variable.
dpb
dpb on 13 Apr 2019
I'll try to look at the data some this evening; that will help some...meanwhile you've not really yet given any meaningful context to what the other variables are and I have no idea what " a sailinty column and different bands paremeters ( 7 columns), from which i need to generate a predicted salinity and eventually the equation that i would use in GIS" actually means or how that bears upon the problem.
What are "different band parameters"? Without any idea at all of what data are it's tough to have any klew as to what makes any physical sense at all...just because one could find a set of variables and a polyfit of given degree doesn't mean one should.

Sign in to comment.

Accepted Answer

dpb
dpb on 14 Apr 2019
Edited: dpb on 14 Apr 2019
Higher order polynomials aren't going to help here...the following code generated the figure:
for i=3:9
subplot(4,2,i-2);
[~,ix]=sort(SB(:,i));
plot(SB(ix,i),SB(ix,1))
xlabel(sprintf('B%d',i-2))
end
There's virtually no correlation with most of the corollary variables; what there is in pieces breaks down in other areas of every variable.
Just taking a stab; ran one that uses all variables; another with the only two that were staistically significant -- those results are
>> fitlm(SB(:,3:end),SB(:,1))
ans =
Linear regression model:
y ~ 1 + x1 + x2 + x3 + x4 + x5 + x6 + x7
Estimated Coefficients:
Estimate SE tStat pValue
________ _____ _____ ______
(Intercept) 63.33 13.62 4.65 0.13
x1 -29.51 18.92 -1.56 0.36
x2 20.66 15.52 1.33 0.41
x3 4.74 14.23 0.33 0.80
x4 -4.83 15.69 -0.31 0.81
x5 20.96 33.11 0.63 0.64
x6 -0.02 21.87 -0.00 1.00
x7 1.42 16.83 0.08 0.95
Number of observations: 9, Error degrees of freedom: 1
Root Mean Squared Error: 2.49
R-squared: 0.934, Adjusted R-Squared 0.474
F-statistic vs. constant model: 2.03, p-value = 0.495
>> figure
>> LMA=ans;
>> plot(LMA)
>> title('Salinity ~ 1 + B1 +B2 + B3 +B4 + B5 +B6 + B7')
>> LM12=fitlm(SB(:,3:4),SB(:,1))
LM12 =
Linear regression model:
y ~ 1 + x1 + x2
Estimated Coefficients:
Estimate SE tStat pValue
________ ____ _____ ______
(Intercept) 67.74 2.13 31.82 0.00
x1 -20.86 5.90 -3.53 0.01
x2 21.88 3.21 6.82 0.00
Number of observations: 9, Error degrees of freedom: 6
Root Mean Squared Error: 1.33
R-squared: 0.887, Adjusted R-Squared 0.85
F-statistic vs. constant model: 23.6, p-value = 0.00144
>> figure
>> plot(LM12)
>> title('Salinity ~ 1 + B1 +B2')
>> ylim([40 100])
The last puts the plots on the same scale; notice the intervals are much tighter with only two predictors. Whether this would be worth a hoot for future predictions is pretty much pure luck I'd guess...
  7 Comments
dpb
dpb on 15 Apr 2019
Edited: dpb on 15 Apr 2019
"... the issue is in the data itself."
Well, yes and no...the specific dataset certainly hasn't much apparent correlation with any simple combination of the B vectors, true. You also don't have much data to go on unless this is just a tiny subset of the whole data set?
In the larger picture, it does appear from the tables that those folks did have a large-enough dataset that they could split between a fitting set and a testing set to check on the model to some extent, anyway. You couldn't do that with any confidence at all here simply for lack of enough data to do so.
However, while it's not possible to say for certain without seeing the whole rationale behind the fitting process undertaken, it still looks to me like the modelling was just "throwing darts" of continuing ad hoc combinations until happened to find something. That is fraught with danger in that while it may work for a given data set, without some rationale behind it, future data may not fit at all. That they did have some verification effort at least makes some effort against that, but it's still not very satisfying that there's any rationale for choosing the model other than chance correlation.
moeJ
moeJ on 16 Apr 2019
Edited: moeJ on 16 Apr 2019
I see I see. Thank you so so much for your immense help over the past few days. I'll have to check back with my faculty members regarding the figures they provided me with before I can proceed, hopefully they can provide me with a large-enough dataset or something to help. Again, I really appreciate your effort, you've been a great help!

Sign in to comment.

More Answers (1)

dpb
dpb on 9 Apr 2019
Taking a shot that the presumption earlier is the correct one--
xy=xlsread('ML.xlsx','sheet2'); % read the data into array
N=size(xy,2)-1; % there are one fewer y vectors than columns in array
mdl=cell(N,1); % create an empty cell arrray to hold fit results
for i=1:N % for each "y" column
mdl(i)={fitlm(xy(:,1),xy(:,i+1),'purequadratic')}; % fit the quadratic, store in cell array
end
will result in a Nx1 cell array holding the N linearmodel objects. To see each, just dereference the cell content with the curlies (braces). I just did one with a set of randn() values so the coefficients are near zero, but you get the following output by default. See the doc for fitlim and link to the linearmodel properties to see all about it...
>> mdl{1}
ans =
Linear regression model:
y ~ 1 + x1 + x1^2
Estimated Coefficients:
Estimate SE tStat pValue
________ ____ _____ ______
(Intercept) -0.05 0.71 -0.07 0.95
x1 0.03 0.16 0.22 0.83
x1^2 -0.00 0.01 -0.66 0.52
Number of observations: 20, Error degrees of freedom: 17
Root Mean Squared Error: 0.95
R-squared: 0.17, Adjusted R-Squared 0.0722
F-statistic vs. constant model: 1.74, p-value = 0.206
>>
  4 Comments
Image Analyst
Image Analyst on 13 Apr 2019
Edited: Image Analyst on 13 Apr 2019
Despite a strong hint from me and a direct request from dpb, you've still not attached your data, 'ML.xlsx'. Why not? Please do so if you want good answers from here on out.
moeJ
moeJ on 13 Apr 2019
kindally find attached the data set.
I need to do a quadratic regression and endup with a formula (to use in GIS), R^2 and a predicted salinity values, please.
thank you so so much for your help again.

Sign in to comment.

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!