using fminsearch on several data sets simultaneously
Show older comments
hello everyone,
I'm a beginner with matlab, so please explain everything in detail...
I have several data sets (testI followed by a number) that have the same equation with three parameters. the first two are different for each data set, the third is the same for all sets. I am trying to find the parameters that best fit my data.
I don't have any problems with fittting each data set individually using
[p1, c2]=fminsearch(@(p)chi2S(p,testConc,testI, testE), p0)
p1 consists of three values, of a, b, and K. When I use the function on each data set, I get different 'best' values for a, b, K.
What I want is to evaluate all sets simultaneously to get a single value for K that is best for all sets. a and b may vary for each set.
I hope I am more or less clear, if you need more info please ask.
Let me now if you have any ideas, Thanks, Yamel
Accepted Answer
More Answers (4)
Walter Roberson
on 18 Jun 2012
minimize the sum of the squares of the fits for all three.
[p1, c2]=fminsearch(@(p) (chi2S(p,testConc,testI1, testE).^2 + chi2S(p,testConc,testI2, testE).^2 + chi2S(p,testConc,testI3, testE).^2), p0)
2 Comments
mark wentink
on 18 Jun 2012
Walter Roberson
on 18 Jun 2012
Good point. I will need to think more about this.
Peter Perkins
on 22 Jun 2012
The usual way to do this is by using what are called "dummy variables". I can't tell what you mean by, "I have several data sets (testI followed by a number)", so I'm going to take a guess at how to use a dummy variable in your case, you'll have to adapt it as appropriate.
I'll guess that you have two predictor variables, testC and testI, and one response, testE. Let's say you have two sets of those, with lengths n1 and n2. Create a new predictor variable
dummy = [repmat(1,n1,1); repmat(2,n2,1)]
then concatenate the two testC's together, testI's together, testE's together. Now you have one big (n1+n2)x4 set of data: the three original but concatenated variables, and dummy. Your model is chi2S(p,testConc,testI, testE), I'll guess that inside of that you compute something in the form of
sum((testE - f(p,testC,testI)).^2)
To get "stratified" estimates of a and b, and a "pooled" estimate of K, you need is to minimize
sum((testE(dummy==1) - f(p([1 3 5]),testC(dummy==1),testI(dummy==1))).^2) + sum((testE(dummy==2) - f(p([2 4 5]),testC(dummy==2),testI(dummy==1))).^2)
where p is now [a1 a2 b1 b2 K]. Pick starting values, pass this to fminsearch, and there you go. If your model really is this kind of response = f(parameters,predictors) form, I would strongly recommend that you use nlinfit, if you have access to the Statistics Toolbox, or lsqcurvefit, if you have access to the Optimization Toolbox.
Hope this helps.
1 Comment
Sargondjani
on 23 Jun 2012
i think "several data sets (testI followed by a number)" refers just to their names
Sargondjani
on 23 Jun 2012
To answer the original post: if you have N data sets, you just have to combine the objective functions for the N data sets, so you you simply add the errors (i assume chi2S calculates the error). The inputs should then be: N times a, b and 1 time K. All have to be put in one vector (dim: 2xN+1,1), say:
X0=[K0;a0(1:N,1);b0(1:N,1)] %i put 1:N to emphasize the dimensions
The objective would look like:
function chi2S_tot(K,a,b,testConc,testI, testE) %dim. a & b = (N,1);
error=zeros(N,1);
error = ..... %jusst calculate error for each individual set, as you did before
err_tot = sum(error);
Now the call for fminsearch would be:
[X,c2]=fminsearch(@(X)chi2S_tot(X(1,1),X(2:N+1,1),X(N+2:end,1),......), X0);
K=X(1,1)
a=X(2:N+1,1)
b=X(N+2:end,1)
I dont know how many data sets you have, but the problem is that fminsearch is not well suited for optimizing for more than a couple of variables, so you problably need the optimization toolbox (fminunc) to solve this in a reasonable amount of time... or you can write your own procedure (easier than it seems )
mark wentink
on 25 Jun 2012
0 votes
5 Comments
Sargondjani
on 25 Jun 2012
do you understand what i propose (it is nested optimization):
-make a main objective function with input K and output Y=sum(y)
-inside this objective function you optimize: y(set_i,1)=chi2S(...) for every set_i=1:number_of_sets, and given K (this gives you the 'y' as output and two optimized parameters in every set)
This way you cut the problem in two parts: optimizing a, b for every set and K for all sets. The tricky part now is to make sure that your values for a and b in the last iteration (for every set) are stored , so you can use them again as starting value for the next iteration of K. since you are a beginner, it is probably easiest to save and load them, but this will not be very fast
if you dont understand, then pls tell me which part you dont get (i tried hard to formulate it nicely, haha)
mark wentink
on 25 Jun 2012
Sargondjani
on 25 Jun 2012
you are much better at explaining what i tried to say. lol.
anyway, it might take ages to converge... so getting good starting values for a and b (ie. the last iteration) are important to speed things up.
if computation time is too long, then you could cut that by using parallel computing (start multiple workers and use the 'parfor' to optimize for every set)
mark wentink
on 25 Jun 2012
Sargondjani
on 25 Jun 2012
you're welcome m8. sorry to hear about your data problem... that's just terrible!!
Categories
Find more on Optimize Model Response in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!