Slow/Non-Convergence for Large Nonlinear Programming Problem -- Better Way to Solve/Formulate?

I have implemented a program to solve
For this problem, fij's and tau_j's are the variables to be optimized. qobs_jk is a given, and q_jk is calculated as
The MATLAB code to optimize this problem looks something like (thank you Matt J for the help!):
%Initial guesses
sol0.f=f_guess; %np by ni
sol0.tau=tau_guess; %np by 1
%declare optimization variables, bounds, and constraints
f=optimvar('f',[ni,np],'LowerBound',0, 'UpperBound',1);
tau=optimvar('tau',np,'LowerBound',0);
con.sumbound=sum(f,2)<=1;
[p,idx]=prob2matrices({f,tau},'Constraints',con,'sol0',sol0); %convert constraints into appropriate data structure (function by Matt J https://www.mathworks.com/matlabcentral/fileexchange/74481-prob2matrices-a-selective-version-of-prob2struct)
%%Optimization
fun=@(unknowns)CRMPobj(unknowns, idx, injection, production, time, Gamma); %this step creates the function to be optimized
options= optimoptions('fmincon', 'MaxFunctionEvaluations', 10000); %enforce max function evaluations (default is 3000)
[unknowns, obj]=fmincon(fun,p.x0,p.Aineq,p.bineq,[],[],p.lb,p.ub,[],options); %this step solves the optimization
The problem solves correctly and quickly for a simple example problem of ni=5, np=4, and nt=200. However, when moving to a real data set of ni=182, np=300, and nt=25, I fail to see any meaningful progression in the solution up to the maximum number of timesteps is reached (I am working on having MATLAB installed on my company's server, so RAM won't soon be an issue). I have a couple of remarks and guiding questions that may be useful in reaching a solution to this problem (forgive my lack of formal NLP knowledge):
  1. Before running out of memory, I am able to run fmincon() for approximately 20000 iterations. In this time, the solution obtained for f_ij and tau_j is barely (if at all) changed from the initial guesses, and the objective function, accordingly, doesn't change much. Is this lack of change likely (1) because the problem requires many more iterations to converge to a correct solution (hundreds of thousands, millions? does this become a concern of computational time?) or (2) is the function likely getting stuck in a local minimum? If the latter;
  2. Is fmincon() the best way to solve this problem? Are there other local optimizers that are more useful or is a particular global optimizer recommended?
  3. In terms of converging upon a solution, are the number of constraints an issue, or is it likely the number of variables being optimized? I am optimizing ni*np+np variables for ni constraints.
All in all, I am looking to better understand if my limitation in solving this problem for a real data set is due (1) to computation time and memory limits AND/OR (2) due to the nature of the way I have formulated this problem in MATLAB. Any insights into understanding the nature of this problem and how I may reach a solution is much appreciated.

2 Comments

Before running out of memory, I am able to run fmincon() for approximately 20000 iterations
It may be good to see what optimoptions you are actually using. With the code you have posted (MaxFunctionEvaluations = 10000) , fmincon could not possibly run for 20000 iterations.
What are the initial conditions on the recursion that defines qjk? What is q_j(k-1) when k=1?

Sign in to comment.

Answers (1)

Before running out of memory, I am able to run fmincon() for approximately 20000 iterations.
It is concerning that you are running out of memory. The amount of consumed memory should normally not increase as the optimization progresses.
In this time, the solution obtained for f_ij and tau_j is barely (if at all) changed from the initial guesses, and the objective function, accordingly, doesn't change much.
It might be worth checking the gradients and the Hessian to see if they have vanishingly small values.
Is fmincon() the best way to solve this problem?
fmincon is the only local optimizer that will solve a problem of this form. A global optimizer might be worth a try (e.g., ga()) if an educated intial guess is hard to derive.
In terms of converging upon a solution, are the number of constraints an issue, or is it likely the number of variables being optimized? I am optimizing ni*np+np variables for ni constraints.
My intuition says that it is a case of vanishing gradients. You'll recall that I had concerns about formulating the optimization in terms of tau, where the function has some very flat areas, as opposed to theta=exp(-delta_t/tau). The increase in the problem dimension may be exacerbating the problem.

17 Comments

Hello again, Matt! Good to hear from you. Let me try to address some of your points below:
It may be good to see what optimoptions you are actually using. With the code you have posted (MaxFunctionEvaluations = 10000) , fmincon could not possibly run for 20000 iterations.
I should specify that, originally, I received the error "Requested 54900x54900 (22.5GB) array exceeds maximum array size preference (15.8GB)." from the command eye(np*ni) executed within fmincon(). I was able to get around this by unchecking "Limit the maximum array size to a percentage of RAM" in my preferences. This was able to allow me to avoid the abovementioned error and the subsequent crash, up to about 20000 iterations (30000 gave an "Out of Memory" error). My options are as follows:
fmincon options:
Options used by current Algorithm ('interior-point'):
(Other available algorithms: 'active-set', 'sqp', 'sqp-legacy', 'trust-region-reflective')
Set properties:
MaxFunctionEvaluations: 10000
Default properties:
Algorithm: 'interior-point'
BarrierParamUpdate: 'monotone'
CheckGradients: 0
ConstraintTolerance: 1.0000e-06
Display: 'final'
FiniteDifferenceStepSize: 'sqrt(eps)'
FiniteDifferenceType: 'forward'
HessianApproximation: 'bfgs'
HessianFcn: []
HessianMultiplyFcn: []
HonorBounds: 1
MaxIterations: 1000
ObjectiveLimit: -1.0000e+20
OptimalityTolerance: 1.0000e-06
OutputFcn: []
PlotFcn: []
ScaleProblem: 0
SpecifyConstraintGradient: 0
SpecifyObjectiveGradient: 0
StepTolerance: 1.0000e-10
SubproblemAlgorithm: 'factorization'
TypicalX: 'ones(numberOfVariables,1)'
UseParallel: 0
It may be good to see what optimoptions you are actually using. With the code you have posted (MaxFunctionEvaluations = 10000) , fmincon could not possibly run for 20000 iterations.
See the text above. I tested running 10000 and 20000 iterations (changed it back to 10000 in the code it would appear). More iterations would have led to an "out of memory" error. The results for 10000 and 20000 iterations were both similar to each other and relatively unchanged from the initial guesses.
It might be worth checking the gradients and the Hessian to see if they have vanishingly small values.
After timestep 10000, the Hessian appears to be, strangely, the identity matrix. It is exactly the identity matrix -- ones on the main diagonal and exactly zero elsewhere. What exactly does this indicate? Is this the case of vanishing gradients that you are mentioning?
fmincon is the only local optimizer that will solve a problem of this form. A global optimizer might be worth a try (e.g., ga()) if an educated intial guess is hard to derive.
I may look into the Global Optimization Toolbox if all else fails. Hopefully the cost will be approved if it is needed. The implementation looks relatively similar to fmincon(). I have implemented the algorithms for initial guesses as suggested in literature, but it is possible that they may not be applicable to such a large problem.
My intuition says that it is a case of vanishing gradients. You'll recall that I had concerns about formulating the optimization in terms of tau, where the function has some very flat areas, as opposed to theta=exp(-delta_t/tau). The increase in the problem dimension may be exacerbating the problem.
I may look into running this formulation once my company gets MATLAB up on our server. I will have to do some data pre-processing to ensure that delta_t is the same for all timesteps to ensure that the tau back calculated at the end is correct.
If the Hessian is in fact the problem, what are your recommendations to finding a solution? I will try the theta formulation this afternoon or tomorrow morning (backed up with a few other things!) and perhaps look into global optimization if we cannot get there locally.
Is this the case of vanishing gradients that you are mentioning?
The gradient is separate from the Hessian. However, on the subject of the Hessian, you should probably be using one of the fmincon option configurations that don't require you to compute Hessian explicitly, e.g., HessianMultiplyFcn. Computing a Hessian is only practical in low dimensional problems.
Hello Matt,
My apologies for now just getting back to you. Some comments on the last week of progress:
I did indeed implement the theta formulation theta=exp(-delta_t/tau) for my problem. Based on benchmarking on smaller problems that do converge, the speed saved (to convergence) is approximately 50%. However, for the larger problem that I am currently running, I am on the 4th day of running (approximately 40 million iterations of MaxFunctionEvaluations), so we are talking days (at a minimum) to reach convergence.
The problem I am optimizing has np=300 and ni=15, so we are talking np*ni+np=4800 variables to optimize here. The problem is showing to be quite slow and perhaps not solvable in a reasonable amount of time.
Do you have any knowledge as to what options I may be able to tweak within fmincon() to speed up convergence? I suppose the major question I am asking is "how much can convergence speed be tweaked by changing the options of fmincon() and how much is unresolvable due to the nature (size) of the problem itself?"
Let me also address your previous comment You should probably be using one of the fmincon option configurations that don't require you to compute Hessian explicitly, e.g., HessianMultiplyFcn. Computing a Hessian is only practical in low dimensional problems. Currently, I have HessianMultiplyFcn set to [], and the algorithm fmincon() is using is interior point with HessianApproximation set to "bfgs." Would you recommend that I use 'lbfgs' or 'finite-difference' for HessianApproximation instead, since these do not return a Hessian?
Thanks again for your help.
I don't know how you are assessing convergence. What are the options that you have set? If the iterations are not progressing, how do you know that there is any progress to be made? How do you know that the solution reached after 100 iterations is not locally optimum?
I should have specified -- I constrained the problem a bit (from np=300 and ni=182 to np=300 and ni=15) and now the problem does progress. I am assessing convergence by meeting the criteria "fmincon stopped because the size of the current step is less than the value of the step size tolerance and constraints are satisfied to within the value of the constraint tolerance." For a smaller problem, say, np=30 and ni=15, this converges in about 10 minutes. For np=300 and ni=15, I have been running the problem for about 4 days now and have not reached convergence or even a solution that makes physical sense. The options I am using are as follows:
fmincon options:
Options used by current Algorithm ('interior-point'):
(Other available algorithms: 'active-set', 'sqp', 'sqp-legacy', 'trust-region-reflective')
Set properties:
MaxFunctionEvaluations: 10000000
MaxIterations: 1000000
Default properties:
Algorithm: 'interior-point'
BarrierParamUpdate: 'monotone'
CheckGradients: 0
ConstraintTolerance: 1.0000e-06
Display: 'final'
FiniteDifferenceStepSize: 'sqrt(eps)'
FiniteDifferenceType: 'forward'
HessianApproximation: 'bfgs'
HessianFcn: []
HessianMultiplyFcn: []
HonorBounds: 1
ObjectiveLimit: -1.0000e+20
OptimalityTolerance: 1.0000e-06
OutputFcn: []
PlotFcn: []
ScaleProblem: 0
SpecifyConstraintGradient: 0
SpecifyObjectiveGradient: 0
StepTolerance: 1.0000e-10
SubproblemAlgorithm: 'factorization'
TypicalX: 'ones(numberOfVariables,1)'
UseParallel: 0
have not reached convergence or even a solution that makes physical sense
That suggests to me that you have not used an initial guess that makes physical sense. How are you choosing your initil point?
I am implementing initial guesses as suggested by literature. The fij's initial guesses are calculated using inverse distance, whereas the tau_j's are the output from a smaller optimization problem where all qjk and Iik is lumped into one value (np=1, ni=1) qk and Ik.
I'm not sure what inverse distance means. I notice that the minimization problem reduces to a linear least squares problem in f when the taus are known and fixed. You should probably use lsqlin to derive the initial guesses for the fij subject to whatever initial values you've chosen for tau. It would also be good to see the Display=iter output of the optimization.
Inverse distance means that initial guesses for fij are weighted higher for i and j that are closer to each other (we are talking oil and gas wells, so it is a good first assumption to assume closer wells are more connected).
Using lsqlin to derive initial guesses sounds like a good idea. I must say that I am unsure how to formulate the objective function in terms of "C" and "d" (my fmincon is formulated as fmincon(fun,p.x0,p.Aineq,p.bineq,[],[],p.lb,p.ub,[],options)
I currently have a run going, but I will share with you the 'display', 'iter' output for the next run, likely tomorrow or Saturday (I am running 10^7 MaxFunctionEvaluations per day).
I may also look into how much the parallel computing toolbox will speed up this problem. I do not have access to this toolbox at work so I will have to try this problem at home.
I am also looking into how much the parallel computing toolbox may speed up this problem. I do not have access to this toolbox at work so I will have to test at home.
Rather than using the Parallel Computing Toolbox, I think it would be more beneficial if you supply your own analytical gradient computations (SpecifyObjectiveGradient=true), and if possible your own Hessian computation (with HessianFcn).
As an alternative to supplying a Hessian, you could try one of the other fmincon algorithms that don't require a Hessian computation, like Active-Set or SQP. However, even in this case, I think you should at least provide an analytical gradient routine.
Admittedly, I do not have the understanding to know what goes into supplying a gradient. Is this a rudimentary calculation or does it require some a priori knowledge for the specific problem?
I don't know if analytically calculating the gradient is possible since qobs_jk and Iik is real, measured data that is implicitly a function of tau and f. Finite differencing would be needed anyways, I'd assume.
since qobs_jk and Iik is real, measured data that is implicitly a function of tau and f
I don't see how measured data could be a function of anything.They are not calculated during the optimization, right? They act as constants during the optimization don't they?
The gradient should be given by,
gradient(tau,f)=2*J(tau,f).'*(qobj-q(tau,f))
where J(tau,f) is the Jacobian of q with respect to tau and f.
I will put some thought into implementing this. I would have to manually calculate the Jacobian since I do not have the symbolic math toolbox.
Well, the key is to recognize that q can be written in matrix vector form as
q=f.'*I*M1(tau1_)*M2(tau_2)*M3(tau_3)*...*M(tau_n)
were Mi(tau_i) are sparse upper triangular matrices. The above is a pretty easy form to take partial derivatives of.
For example, the Jacobian of q with respect to f.' will be,
Jf=kron((I*M1(tau1_)*M2(tau_2)*M3(tau_3)*...*M(tau_n)).' , speye(np))
and the gradient of z with respect to f will be,
dzdf(:)= 2*I*M1(tau1_)*M2(tau_2)*M3(tau_3)*...*M(tau_n)*(qobj-q).'
I tried parallelizing last night (6 agents) and the run was actually slower, so I would imagine that supplying the gradient is my best bet here to speed it up. I've got a few questions:
  1. Can you give some more info about the form of the sparse upper triangular matrix? I am not sure what a sparse upper triangular matrix that is a function of tau should look like.
  2. What should the size of the gradient be? Recall that my unknowns are f, which is np*ni, and tau, which is np, for a total of np*ni+np unknowns. Should the gradient not be with respect to each of these unknowns, not just f?
  3. qobj and q are np by nt -- does any summation need to occur across time? I'm not sure if the matrix multiplication is adding up otherwise. Hope it's not asking too much, but laying out the dimensionality of each matrix would be extremely helpful.
Apologies for the basic questions. You can tell this is not my area of expertise.

Sign in to comment.

Asked:

on 12 Jul 2021

Edited:

on 24 Jul 2021

Community Treasure Hunt

Find the treasures in MATLAB Central and discover how the community can help you!

Start Hunting!