Slow/Non-Convergence for Large Nonlinear Programming Problem -- Better Way to Solve/Formulate?
Show older comments
I have implemented a program to solve

For this problem, fij's and tau_j's are the variables to be optimized. qobs_jk is a given, and q_jk is calculated as

The MATLAB code to optimize this problem looks something like (thank you Matt J for the help!):
%Initial guesses
sol0.f=f_guess; %np by ni
sol0.tau=tau_guess; %np by 1
%declare optimization variables, bounds, and constraints
f=optimvar('f',[ni,np],'LowerBound',0, 'UpperBound',1);
tau=optimvar('tau',np,'LowerBound',0);
con.sumbound=sum(f,2)<=1;
[p,idx]=prob2matrices({f,tau},'Constraints',con,'sol0',sol0); %convert constraints into appropriate data structure (function by Matt J https://www.mathworks.com/matlabcentral/fileexchange/74481-prob2matrices-a-selective-version-of-prob2struct)
%%Optimization
fun=@(unknowns)CRMPobj(unknowns, idx, injection, production, time, Gamma); %this step creates the function to be optimized
options= optimoptions('fmincon', 'MaxFunctionEvaluations', 10000); %enforce max function evaluations (default is 3000)
[unknowns, obj]=fmincon(fun,p.x0,p.Aineq,p.bineq,[],[],p.lb,p.ub,[],options); %this step solves the optimization
The problem solves correctly and quickly for a simple example problem of ni=5, np=4, and nt=200. However, when moving to a real data set of ni=182, np=300, and nt=25, I fail to see any meaningful progression in the solution up to the maximum number of timesteps is reached (I am working on having MATLAB installed on my company's server, so RAM won't soon be an issue). I have a couple of remarks and guiding questions that may be useful in reaching a solution to this problem (forgive my lack of formal NLP knowledge):
- Before running out of memory, I am able to run fmincon() for approximately 20000 iterations. In this time, the solution obtained for f_ij and tau_j is barely (if at all) changed from the initial guesses, and the objective function, accordingly, doesn't change much. Is this lack of change likely (1) because the problem requires many more iterations to converge to a correct solution (hundreds of thousands, millions? does this become a concern of computational time?) or (2) is the function likely getting stuck in a local minimum? If the latter;
- Is fmincon() the best way to solve this problem? Are there other local optimizers that are more useful or is a particular global optimizer recommended?
- In terms of converging upon a solution, are the number of constraints an issue, or is it likely the number of variables being optimized? I am optimizing ni*np+np variables for ni constraints.
All in all, I am looking to better understand if my limitation in solving this problem for a real data set is due (1) to computation time and memory limits AND/OR (2) due to the nature of the way I have formulated this problem in MATLAB. Any insights into understanding the nature of this problem and how I may reach a solution is much appreciated.
2 Comments
Before running out of memory, I am able to run fmincon() for approximately 20000 iterations
It may be good to see what optimoptions you are actually using. With the code you have posted (MaxFunctionEvaluations = 10000) , fmincon could not possibly run for 20000 iterations.
Matt J
on 22 Jul 2021
What are the initial conditions on the recursion that defines qjk? What is q_j(k-1) when k=1?
Answers (1)
Before running out of memory, I am able to run fmincon() for approximately 20000 iterations.
It is concerning that you are running out of memory. The amount of consumed memory should normally not increase as the optimization progresses.
In this time, the solution obtained for f_ij and tau_j is barely (if at all) changed from the initial guesses, and the objective function, accordingly, doesn't change much.
It might be worth checking the gradients and the Hessian to see if they have vanishingly small values.
Is fmincon() the best way to solve this problem?
fmincon is the only local optimizer that will solve a problem of this form. A global optimizer might be worth a try (e.g., ga()) if an educated intial guess is hard to derive.
In terms of converging upon a solution, are the number of constraints an issue, or is it likely the number of variables being optimized? I am optimizing ni*np+np variables for ni constraints.
My intuition says that it is a case of vanishing gradients. You'll recall that I had concerns about formulating the optimization in terms of tau, where the function has some very flat areas, as opposed to theta=exp(-delta_t/tau). The increase in the problem dimension may be exacerbating the problem.
17 Comments
Corey Hoydic
on 13 Jul 2021
Matt J
on 15 Jul 2021
Is this the case of vanishing gradients that you are mentioning?
The gradient is separate from the Hessian. However, on the subject of the Hessian, you should probably be using one of the fmincon option configurations that don't require you to compute Hessian explicitly, e.g., HessianMultiplyFcn. Computing a Hessian is only practical in low dimensional problems.
Corey Hoydic
on 22 Jul 2021
I don't know how you are assessing convergence. What are the options that you have set? If the iterations are not progressing, how do you know that there is any progress to be made? How do you know that the solution reached after 100 iterations is not locally optimum?
Corey Hoydic
on 22 Jul 2021
Matt J
on 22 Jul 2021
have not reached convergence or even a solution that makes physical sense
That suggests to me that you have not used an initial guess that makes physical sense. How are you choosing your initil point?
Corey Hoydic
on 22 Jul 2021
I'm not sure what inverse distance means. I notice that the minimization problem reduces to a linear least squares problem in f when the taus are known and fixed. You should probably use lsqlin to derive the initial guesses for the fij subject to whatever initial values you've chosen for tau. It would also be good to see the Display=iter output of the optimization.
Corey Hoydic
on 22 Jul 2021
Edited: Corey Hoydic
on 22 Jul 2021
Rather than using the Parallel Computing Toolbox, I think it would be more beneficial if you supply your own analytical gradient computations (SpecifyObjectiveGradient=true), and if possible your own Hessian computation (with HessianFcn).
As an alternative to supplying a Hessian, you could try one of the other fmincon algorithms that don't require a Hessian computation, like Active-Set or SQP. However, even in this case, I think you should at least provide an analytical gradient routine.
Corey Hoydic
on 22 Jul 2021
Corey Hoydic
on 22 Jul 2021
Edited: Corey Hoydic
on 22 Jul 2021
since qobs_jk and Iik is real, measured data that is implicitly a function of tau and f
I don't see how measured data could be a function of anything.They are not calculated during the optimization, right? They act as constants during the optimization don't they?
The gradient should be given by,
gradient(tau,f)=2*J(tau,f).'*(qobj-q(tau,f))
where J(tau,f) is the Jacobian of q with respect to tau and f.
Corey Hoydic
on 22 Jul 2021
Well, the key is to recognize that q can be written in matrix vector form as
q=f.'*I*M1(tau1_)*M2(tau_2)*M3(tau_3)*...*M(tau_n)
were Mi(tau_i) are sparse upper triangular matrices. The above is a pretty easy form to take partial derivatives of.
For example, the Jacobian of q with respect to f.' will be,
Jf=kron((I*M1(tau1_)*M2(tau_2)*M3(tau_3)*...*M(tau_n)).' , speye(np))
and the gradient of z with respect to f will be,
dzdf(:)= 2*I*M1(tau1_)*M2(tau_2)*M3(tau_3)*...*M(tau_n)*(qobj-q).'
Corey Hoydic
on 23 Jul 2021
Edited: Corey Hoydic
on 24 Jul 2021
Categories
Find more on Linear Least Squares in Help Center and File Exchange
Community Treasure Hunt
Find the treasures in MATLAB Central and discover how the community can help you!
Start Hunting!