Main Content

Improve Model Fit with Weights

This example shows how to fit a polynomial model to data using both the linear least-squares method and the weighted least-squares method for comparison.

Generate sample data from different normal distributions by using the randn function.

rng("default")  % For reproducibility

rnorm = [];
idx = [];

for k=1:20
    r = k*randn([20,1]) + (1/20)*(k^3);
    rnorm = [rnorm;r];
    idx = [idx;ones(20,1).*k];
end

The dependent variable rnorm contains sample data from 20 normal distributions. The independent variable idx contains integers indicating whether two elements in rnorm are sampled from the same normal distribution.

Fit a third-degree polynomial model to idx and rnorm. Return information about the coefficient estimates and the algorithm used to fit the model.

[p3fit,~,fitinfo] = fit(idx,rnorm,"poly3")
p3fit = 
     Linear model Poly3:
     p3fit(x) = p1*x^3 + p2*x^2 + p3*x + p4
     Coefficients (with 95% confidence bounds):
       p1 =     0.05156  (0.0438, 0.05932)
       p2 =    -0.03993  (-0.2875, 0.2076)
       p3 =      0.1418  (-2.124, 2.408)
       p4 =      0.0462  (-5.585, 5.678)
fitinfo = struct with fields:
        numobs: 400
      numparam: 4
     residuals: [400x1 double]
      Jacobian: [400x4 double]
      exitflag: 1
     algorithm: 'QR factorization and solve'
    iterations: 1

p3fit contains the estimates for the model coefficients with 95% confidence intervals. The default fitting method for fitting a polynomial model is linear least squares. fitinfo contains information about the fitting algorithm used to fit the coefficients to the data. The error in the data can be estimated by the residuals stored in fitinfo.

Plot the residuals using a stem plot.

stem(idx,fitinfo.residuals)
xlabel("idx")
ylabel("residuals")

The plot of the residuals shows that their variance increases as the values in idx increase. The nonconstant variances across the different values of idx indicate that the weighted least-squares fitting method is more appropriate for calculating the model coefficients.

Create a vector of zeros for later storage of the weights.

W = zeros(length(rnorm),1);

The weights you supply transform the residual variances so that they are constant for different values of idx. Define the weight for each element in rnorm as the reciprocal of the residual variance for the corresponding value in idx. Then fit the model with the weights.

for k=1:20
    rnorm_idx = rnorm(idx==k);
    recipvar = 1/var(rnorm_idx);
    w = (idx==k).*recipvar;
    W = W+w;
end
[wp3fit,~,wfitinfo] = fit(idx,rnorm,"poly3","Weights",W)
wp3fit = 
     Linear model Poly3:
     wp3fit(x) = p1*x^3 + p2*x^2 + p3*x + p4
     Coefficients (with 95% confidence bounds):
       p1 =     0.04894  (0.04419, 0.0537)
       p2 =     0.03601  (-0.08744, 0.1595)
       p3 =     -0.4262  (-1.253, 0.4009)
       p4 =      0.9836  (-0.1959, 2.163)
wfitinfo = struct with fields:
        numobs: 400
      numparam: 4
     residuals: [400x1 double]
      Jacobian: [400x4 double]
      exitflag: 1
     algorithm: 'QR factorization and solve'
    iterations: 1

wp3fit and wfitinfo contain the results of the weighted least-squares fitting.

Display p3fit, wp3fit, and rnorm in the same plot.

plot(p3fit,idx,rnorm)
hold on
plot(wp3fit,"g")
xlabel("idx")
ylabel("rnorm")
legend(["rnorm","linear least-squares fit","weighted least-squares fit"])
hold off

The plot shows wp3fit closely tracking p3fit.

You can determine whether wp3fit is a better fit than p3fit by plotting the residuals.

stem(idx,wfitinfo.residuals)
xlabel("idx")
ylabel("residuals")

The output shows that the wp3fit residuals are smaller than the p3fit residuals. The variances of the wp3fit residuals are also more similar for different values of idx than the variances of the p3fit residuals.

See Also

Functions

Related Topics