Linear Fit of data with uncertainty
28 views (last 30 days)
Show older comments
Hi,
I have measured a data set y (depending on data x) with measurement uncertainties
.
I want to fit
linearly, with weights. This is what i use:
fitlm(x,y,'y ~ x1','Weights',1./(dy.^2));
Now the linear model gives me a standard error of the coefficients, but that ist solely based on the variance of the data points (and the weights).
If I were to multiply my weights by any factor, the coefficients' standard error would not change.
But my problem is: I assume that the fit parameters would be much more uncertain, if
were greater.
For example, if all my data would align perfectly linear, the estimated coefficients would have zero error. This would be wrong in my case though, as the coefficients could actually vary, based on my measurement uncertainties
.
fitlm seems to be focusing more on the statistical error, rather than the error of each data point.
I didn't find anything about this with Matlab context.
I assume this is a problem common to e.g. experimental physics, so I am surprised no one really talks about it...
Would be glad to find a solution here, thanks!
1 Comment
Answers (1)
Adam Danz
on 10 Jan 2020
Edited: Adam Danz
on 10 Jan 2020
There seems to be some confusion about the interpretation of observation weights.
"If I were to multiply my weights by any factor, the coefficients' standard error would not change."
It is important to note that the weight for each observation is given relative to the weights of the other observations; so different sets of absolute weights can have identical effects
The weights merely indicate how much each data point is relied upon during fitting. From the same resource quoted above,
The size of the weight indicates the precision of the information contained in the associated observation. Optimizing the weighted fitting criterion to find the parameter estimates allows the weights to determine the contribution of each observation to the final parameter estimates.
To demonstrate this, a linear model is fit below with two different sets of weights. The top subplot shows that weights are a function of the residuals where values close to the regression line (not shown) are higher weights and values further from the regression line are lower weights. In the 2nd subplot weights are random. The standard error of the coefficients are written in the upper, left corner of each subplot. As you can see, for the exact same observations, the error is smaller when the weights favor values less affected by noise.
% Create data
x = rand(1,200) * 10;
y = x*1.2 + 3.1;
% Add noise to the y values
% noiseIdx = unique(randi(numel(y),1,numel(y)));
noiseIdx = 1:numel(x);
randVals = (rand(size(noiseIdx))-0.5)*5;
y(noiseIdx) = y(noiseIdx) + randVals;
% Create weights based on noise level of each point.
w = ones(size(x));
w(noiseIdx) = 1-(abs(randVals)/max(abs(randVals)));
% Create random weights
wr = rand(size(w));
% fit model with random wieghts and fit model with
% weights that penalize noise
mdl1 = fitlm(x,y,'Weights',wr);
mdl2 = fitlm(x,y,'Weights',w);
% plot data
clf()
colors = jet(100);
subplot(2,1,1)
scatter(x,y,25,colors(ceil(w*99)+1,:),'Filled')
set(gca,'colormap',colors)
cb = colorbar();
ylabel(cb,'Weight')
title('Weights ~~ Noise')
text(min(xlim),max(ylim),sprintf('SE: %s',num2str(mdl2.Coefficients.SE')))
subplot(2,1,2)
scatter(x,y,25,colors(ceil(wr*99)+1,:),'Filled')
set(gca,'colormap',colors)
cb = colorbar();
ylabel(cb,'Weight')
title('Weight ~~ random')
text(min(xlim),max(ylim),sprintf('SE: %s',num2str(mdl1.Coefficients.SE')))

2 Comments
Leon Stecher
on 12 Oct 2020
Hey!
Is there a way to fit data to a model function not using weights but actualy using the uncertainties on the measured data?
John D'Errico
on 12 Oct 2020
@Leon Stecher - You are not asking to supply weights, but instead supply the actual data variance as a known value at each point.
The classical solution for a WEIGHTED least squares problem produces a solution that treats the weights as only relative things. Thus if you double all of the weights, it has no impact on the solution. Here we can effectively rescale the weights such that the norm of the weight vector is 1, just to make the algebra a little more well behaved. For example, we can read the help for lscov, to find this statement:
"lscov assumes that the covariance matrix of B is known only up to a
scale factor. MSE is an estimate of that unknown scale factor, and
lscov scales the outputs S and STDX appropriately."
But that is clearly not what you wish to see. instead you wish to pose a model where the data variance is known at each point. Effectively you have the case where you have a known covariance matrix. And that would seem to change things. So if we read down further in the help for lscov, we see this:
" ... However, if V is
known to be exactly the covariance matrix of B, then that scaling is
unnecessary. To get the appropriate estimates in this case, you should
rescale S and STDX by 1/MSE and sqrt(1/MSE), respectively."
See Also
Categories
Find more on Descriptive Statistics in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!