Multivariate Linear Regression with Weibull errors

Hi,
I am trying to run a linear regression where the dependent variable lies between 0 and 1. The errors are distributed by 2 parameter Weibull distribution. So essentially I will be estimating 4 parameters: a constant, a parameter for an independent variable and 2 parameters of the weibull distribution. Can somebody guide me as to how I proceed to do this? I am mostly getting search results for survival data but my data is non-survival data. Thanks!

 Accepted Answer

Do some reading about maximum likelihood estimation . This is the general technique one uses for an estimation problem where the errors in the process are assumed to follow some known distribution that is not Gaussian.
Sorry, but I won't write the code for you. One hint I will offer is that it is almost always better to maximize the log (i.e., minimize the negative of the log) of the likelihood function, as otherwise the problem quickly becomes intractable in double precision arithmetic.

4 Comments

Thank you! I read more about my problem. And the way to solve it is to to break the problem in two parts. Step 1: estimate linear regression parameters using a simplex algorithm and Step 2: using the fitted values of residuals(from step 1) to get an estimate of parameters of weibull distribution which is followed by the errors.
Sorry. That is a simple, ad hoc method that does not in fact do what you described as your goal.
What you have described is a bastardization, that uses a poor scheme for estimating the curve parameters (a simplex method. You have not stated what it would be minimizing. A sum of squares would be simply wrong for this problem.) And then you will assume the errors actually follow a Weibull distribution, so then apply a scheme for estimating the parameters of the Weibull. Sorry, but completely ad hoc, and not terribly good statistics.
Feel free to do as you want to do. Your code is yours to write and use after all. You can also use a crystal ball to estimate the parameters. I hear that tea leaves give good estimates too, but that Tarot cards may be easier to read for this. I think you need special Weibull based Tarot cards though. :)
Perhaps you have merely mis-described/mis-understood what you read online, but what you described was just a bad scheme. You get what you pay for on the internet, and you can find all sorts of drivel there.
True maximum likelihood estimation is not that difficult to do. Be careful if you do decide to employ Nelder-Mead in any scheme, since Nelder-Mead cannot handle inequality constraints, and a 2-parameter Weibull is always non-negative.
Let me explain in a little more detail what this will involve.
You need to have an objective function that for ANY set of 4 parameters, thus two line parameters, and two Weibull parameters...
1. Compute the residual. That MUST be a positive number for all data points, so you need to use a tool than can handle constraints.
2. Given the set of PRESUMED Weibull parameters, compute the Weibull PDF for each of those residuals, for those given parameters.
3. Return the product of those individual values from the PDF. This may be a very tiny number, in fact, you will almost always see underflows in this. So it is better if you compute the sum of logs of those values, and maximize that sum instead.
So this will be a 4 parameter optimization, subject to linear inequality constraints at each data point, as well of course, on the Weibull parameters since the shape and scale parameters for the Weibull must both be positive.
The best tool for the optimization would arguably be something like fmincon, although even there you need to be careful, since fmincon does allow for tiny constraint deviations. It's been a while since I looked, but one of the algorithms in fmincon may assure strict adherence of the constraints.
Thank you so much for your guidance. I do accept that the method outlined by me is inferior to a full mle (using 4 parameters), but I don't know why it was suggested in a Biometrika paper.
I am relatively new to matlab, but I will try to write a code for full mle of the model with constraints. fmincon does seem to be the correct choice but I have as yet not been able to write the correct code.
Thank you again, for pointing out my mistake, I was trying to jump to an easy solution while frantically looking for a way to solve the problem.

Sign in to comment.

More Answers (1)

Prachi  Singh
Prachi Singh on 9 May 2016
Edited: Prachi Singh on 9 May 2016
Hi John,
I am trying to run the below code -
--------------------
%Log likelihood function for 4 parameter model(2 line and 2 Weibull parameters)
%p(1), p(2) are Weibull shape parameters & p(3) is line intercept parameter, p(4) is slope parameter
function val=tryone(p,xdata, ydata)
p = [p(1), p(2), p(3), p(4)];
val = -sum(log(p(1))) -sum(log(p(2))) -(p(2)-1)*sum(log(ydata - p(3) - p(4)*xdata))+ p(1)*sum((ydata - p(3) - p(4)*xdata).^p(2));
%Constraint for Weibull errors to be positive
function [c, ceq] = errors(p,xdata,ydata)
c = ydata - p(3) - p(4)*xdata;
ceq = [ ];
%Code:
p_initial = [0.1, 0.1, 0.1, 0.1];
options = optimoptions(@fmincon,'Display','iter','Algorithm','interior-point');
[x,fval] = fmincon(@tryone,p_initial,[],[],[],[],[],[],@errors,options)
---------------------------------------
I am encountering few problems:
1) I don't know how to incorporate constraint for Weibull parameters to be positive
2) I am getting an error "Not enough input arguments" for this command (loglikelihood function mentioned above) -
val = -sum(log(p(1))) -sum(log(p(2))) -(p(2)-1)*sum(log(ydata - p(3) - p(4)*xdata))+ p(1)*sum((ydata - p(3) - p(4)*xdata).^p(2));
I am really new to Matlab and trying my best to understand things, please let me know if I have committed an error while writing the code.
Thanks alot for your patience!

Tags

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!