You are now following this question
- You will see updates in your followed content feed.
- You may receive emails, depending on your communication preferences.
alternatives to gradient-based optimization algorithms like lsqnonlin that handle at least O(10) parameters?
18 views (last 30 days)
Show older comments
I am fitting parameters of a PDE to experimental data utilizing the function lsqnonlin from the optimization toolbox.
I observed that I can successfully optimize 10,...,20 parameters but not more. My objective is, however, to optimize 50 to 100 parameters.
Based on my experience with lsqnonlin (and fmincon and friends,...), it seems that this class of optimization algorithms can handle a small number of parameters (10,...,20) well, but are not appropriate anymore if there are many more parameters.
Fast convergence or computation time are not important to me.
I am coming from an engineering background and do not have deep knowledge about other class of optimization algorithms. That said, I would appreciate if someone could give me some keywords or recommendations regarding alternative optimization algorithms, that are designed for handling a larger number of parameters.
Thank you!
18 Comments
Torsten
on 24 Feb 2023
Edited: Torsten
on 24 Feb 2023
I observed that I can successfully optimize 10,...,20 parameters but not more.
How did you observe this ? The optimizers are not limited in the number of fitting parameters they can manage. If fitting models with up to 100 parameters makes sense depends on the number and quality of your input data.
SA-W
on 24 Feb 2023
How did you observe this ?
My experimental data are numerically generated with a given set of parameters. My parameters are the values of a function at given interpolation points. If I distribute, say, 10 parameters with equally spaced interpolation points over the interval [a,b], lsqnonlin perfectly reidentifies the 10 parameters. However, if I distribute 20 parameters over the same interval [a,b], lsqnonlin reidentifies some of the parameters wrong. That is why I think that lsqnonlin is not longer appropriate for my problem.
Matt J
on 24 Feb 2023
Edited: Matt J
on 24 Feb 2023
However, if I distribute 20 parameters over the same interval [a,b], lsqnonlin reidentifies some of the parameters wrong.
How have you assessed whether they are "wrong"? What is the final objective function value, and how does it compare to the objective value of the "right" parameters?
If both solutions have the same objective value (or nearly so), then they must both be considered correct.
SA-W
on 24 Feb 2023
How have you assessed whether they are "wrong"? What is the final objective function value, and how does it compare to the objective value of the "right" parameters?
The objective function value is indeed in the order of the objective value corresponding to the right parameters, however, the parameters are clearly wrong which I observed by plotting the function.
The parameters that I am optimizing (the function values) have to represent a convex function in 1d. However, the 20 parameters found by lsqnonlin do not define a convex function, but rather a function which is at certain ranges even decreasing, which, for whatever reasons, is also a minimum of the objective function.
Let me ask two questions:
(i) is there a recommended ratio (just roughly) between number of parameters and equations? Currently, my data vectors have 1500 entries and I optimise 20 parameters.
(ii) I am considering enforcing convexity by having the following constraints on the parameters p(i):
p(i-1)-pf(i)+p(i-1) > 0 for all parameters
Is there a way to retranslate this constraints for lsqnonlin, for instance by adding a penalty term to
f = ||r||^2 + penalty term?
Torsten
on 24 Feb 2023
However, the 20 parameters found by lsqnonlin do not define a convex function, but rather a function which is at certain ranges even decreasing, which, for whatever reasons, is also a minimum of the objective function.
Then you must somehow define convexity of the function as a constraint.
Matt J
on 24 Feb 2023
Is there a way to retranslate this constraints for lsqnonlin, for instance by adding a penalty term to
If your penalty is quadratic, then it can be implemented just by expanding the residual vector:
r=[weight1*r; weight2*constraint_violation];
f = ||r||^2
SA-W
on 24 Feb 2023
If your penalty is quadratic, then it can be implemented just by expanding the residual vector
c(i) =: [p(i-1)-2p(i)+p(i-1)]/h^2 > 0 at interpolation point corresponding to p(i)
This inequality constraint is based on the central FD-approximation of the second derivative of a function. Since my parameters are function values I thought it is a good idea to implement these constraints in the following way
if p(i-1)-2p(i)+p(i-1) >= 0
c(i) = 0;
end
if p(i-1)-2p(i)+p(i-1) < 0
c(i) = p(i-1)-2p(i)+p(i-1);
end
Do you think it is reasonable to enforce convexity in such a way or can you recommend a better solution?
I know that this question is not directly related to Matlab, but I really appreciate your help!
Matt J
on 24 Feb 2023
Edited: Matt J
on 24 Feb 2023
Do you think it is reasonable to enforce convexity in such a way or can you recommend a better solution?
Seems reasonable to me, but I would probably use fmincon instead of lsqnonlin, since then I can directly implement,
p(i-1)-2p(i)+p(i-1) >= 0
as a linear inequaltiy constraint.
Matt J
on 24 Feb 2023
If using lsqnon.in, I might instead have,
if p(i-1)-2p(i)+p(i-1) >= 0
c(i) = 0;
end
if p(i-1)-2p(i)+p(i-1) < 0
c(i) = abs(p(i-1)-2p(i)+p(i-1))^1.5;
end
since then c will be a differentiable function of p. lsqnonlin assumes c to be differentiable.
SA-W
on 25 Feb 2023
if p(i-1)-2p(i)+p(i-1) >= 0 c(i) = 0; end if p(i-1)-2p(i)+p(i-1) < 0 c(i) = p(i-1)-2p(i)+p(i-1); end
Here, c would be a non-differentiable function wrt the parameters?
SA-W
on 25 Feb 2023
Und what term is it that makes c(i) = abs(p(i-1)-2p(i)+p(i-1))^1.5
differentiable for all parameters? The exponent ^1.5 or the abs?
Matt J
on 25 Feb 2023
Edited: Matt J
on 25 Feb 2023
The abs in |t|^1.5 doesn't really matter if t^1.5 is only ever evaluated for positive t . As seen below, they are the same function with or without the abs. The differentiability follows from the fact that both the left and right hand derivatives of c(t)=max(0,|t|^1.5) are zero at t=0. Conversely, c(t)=max(0,t) has mismatched right and left derivatives at t=0.
syms t real
c1=piecewise(t<=0,0, t>0, abs(t)^1.5)
c1 =
c2=piecewise(t<=0,0, t>0, t^1.5)
c2 =
fplot(c1,[-1,1]);hold on
fplot(c2,[-1,1],'x');hold off
Matt J
on 26 Feb 2023
Edited: Matt J
on 26 Feb 2023
You could do that, or even have c(i) = (p(i-1)-2p(i)+p(i-1))^200. However, with larger exponents, the gradient of the cost function become small over a wider neighborhood of c=0. Smaller gradients means slower convergence, and also the chance that the OptimalityTolerance would trigger prematurely. Conversely, with smaller exponents (but stil greater than 1), the gradient becomes less smooth. So, there is a trade-off to be reckoned with.
SA-W
on 6 Mar 2023
Edited: SA-W
on 6 Mar 2023
I implemented
r=[weight1*r; weight2*constraint_violation];
f = ||r||^2
iwith lsqnonlin and figured out that
weight_2 = 1e4 %approximately
is necessary that the ineqaulity constraints are considered at all at intermediate iterations and weight_2 below or above 1e4, respectively, will not include sufficiently the constraint_violation or dominates the residual r.
If I implement the above inequality constraints in fmincon and multiply A*x<=b by 1e4 (this multiplication is necessary for fmincon to pay attention to the constraints), fmincon gives a better solution than lsqnonlin. By better, I mean that the solution is nearly perfect equal to the exact solution. The lsqnonlin solution fulfills the constraints in A, however, the solution is not as accurate.
Anyway, fmincon requires twice or three times as much iterations as lsqnonlin requires to find the solution. That said, I would like to stick lsqnonlin because the evaluation of the objective function is quite expensive in my case.
I read the documentation on the iterior-point algorithm. The way fmincon incorporates the constraints is based on a merit function, which is, of course, different than just expanding the residual vector as I did it.
Admittedly, I do not have enough background in optimization. Can you give me another recommendation (which is closer to the way fmincon incorporates the constraints) to implement my constraints in lsqnonlin?
Thank you!
Accepted Answer
Matt J
on 24 Feb 2023
Edited: Matt J
on 24 Feb 2023
it seems that this class of optimization algorithms can handle a small number of parameters (10,...,20) well, but are not appropriate anymore if there are many more parameters.
I am currently using lsqnonlin in optimization problems with 11796480 variables. It converges fine, though in my case the line search operations really do make it much slower than I would like. In any case, 20 variables is definitely not the upper limit.
I am fitting parameters of a PDE to experimental data utilizing the function lsqnonlin from the optimization toolbox.
Make sure you are following the guidelines here,
9 Comments
SA-W
on 24 Feb 2023
Thank you for the reference.
Currently, if my pde solver not not solve the pde for parameters p, I programmed a return statement. But based on this article, NaN is the right way to go, is it?
SA-W
on 24 Feb 2023
One question regarding the treatment of NaN.
As I said, so far I used a return statement when my pde solver failed to converge for given parameters.
If I use NaN instead, lsqnonlin tries a different parameter set. But if this returns NaN again, is another set of parameters tried? In other words, how is prevented that an endless loop of function evaluations occurs?
Matt J
on 24 Feb 2023
Of that I am not certain. I see nothing about it in the documentation.
I'm not sure it matters for you, though, since as discussed above, you have complicated constraints that are better handled with fmincon.
SA-W
on 24 Feb 2023
In the link you gave me, it is said that the solvers in lsqnonlin can recover from NaN.
SA-W
on 1 Mar 2023
If someone comes across our disussion here: the algorithms in lsqnonlin can also recover from NaN.
More Answers (1)
Matt J
on 26 Feb 2023
Edited: Matt J
on 26 Feb 2023
The parameters that I am optimizing (the function values) have to represent a convex function in 1d.
Another strategy that I would suggest is to first fit a constrained curve to your function samples using this FEX download,
This tool will allow you both to constrain the fit to be convex (using the 'ConcaveUp' option), and also to compute the derivates of the fit. Once you have the curve fit and its derivates, you can substitute them into your ODE to obtain conventional equations in the original set of unknown parameters. You can then solve these equations using lsqnonin, or even a linear solver, depending on the form of your equations.
7 Comments
SA-W
on 26 Feb 2023
This approach is not really clear to me yet.
My objective function is the difference between a measured displacement field and a simulated displacement field. The latter depends somehow on the convex 1d function. The nodal values of that function are my parameters to lsqnonlin.
That said, do you think I can use your suggested tool?
Matt J
on 27 Feb 2023
I don't know. We haven't seen a full problem description. What is the role of the ODE? Does it describe the 1D convex function or does it describe the displacement field? My ultimate point is that it would be wise to avoid mkaing the objective function depend on repeated calls to an ODE solver, if possible.
SA-W
on 27 Feb 2023
Edited: SA-W
on 27 Feb 2023
The solution of the ODE is the displacement field (computed via Finite-Element-Mathod), which depends on the 1d convex function. And the nodal values of that convex function are the parameters of the optimization, i.e., the parameters should be changed such that measured displacements and simulated displacements match in a least-squares sense. So the start values passed to lsqnonlin represent the initial shape of the convex function. In other words, I do not have samples of that convex function as my goal is to identify this function.
In order to use the linked FEX download, I need to know samples of the convex function, right?
Matt J
on 27 Feb 2023
My suggestion is only relevant if you can do a spline fit to the thing your ODE is taking derivatives of.
SA-W
on 27 Feb 2023
Would be nice to try that, but let me clarify one point:
Say the 1d convex function (=parameters) is called f, and displacements are denoted as u. Then the objective function is given by
obj. = |u^sim(f) - u^exp|^2
The Jacobian is calculated according to
J = du^sim(f) / df
My suggestion is only relevant if you can do a spline fit to the thing your ODE is taking derivatives of.
Do you mean the Jacobian here? I could do a spline fit to f, but not to the displacements. The displacement is a finite element field.
Matt J
on 6 Mar 2023
I don't know enough about finite element analysis to know why you cannot fit a spline to a finite element field.
SA-W
on 6 Mar 2023
Would you support me further regarding the issue in my new comment (in the comments under the question)?
See Also
Products
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!An Error Occurred
Unable to complete the action because of changes made to the page. Reload the page to see its updated state.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select: .
You can also select a web site from the following list
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina (Español)
- Canada (English)
- United States (English)
Europe
- Belgium (English)
- Denmark (English)
- Deutschland (Deutsch)
- España (Español)
- Finland (English)
- France (Français)
- Ireland (English)
- Italia (Italiano)
- Luxembourg (English)
- Netherlands (English)
- Norway (English)
- Österreich (Deutsch)
- Portugal (English)
- Sweden (English)
- Switzerland
- United Kingdom(English)
Asia Pacific
- Australia (English)
- India (English)
- New Zealand (English)
- 中国
- 日本Japanese (日本語)
- 한국Korean (한국어)