Slow/Non-Convergence for Large Nonlinear Programming Problem -- Better Way to Solve/Formulate?

Question

0 votes

I have implemented a program to solve

For this problem, fij's and tau_j's are the variables to be optimized. qobs_jk is a given, and q_jk is calculated as

The MATLAB code to optimize this problem looks something like (thank you Matt J for the help!):

%Initial guesses
sol0.f=f_guess; %np by ni 
sol0.tau=tau_guess; %np by 1
%declare optimization variables, bounds, and constraints
f=optimvar('f',[ni,np],'LowerBound',0, 'UpperBound',1);
tau=optimvar('tau',np,'LowerBound',0);
con.sumbound=sum(f,2)<=1;
[p,idx]=prob2matrices({f,tau},'Constraints',con,'sol0',sol0); %convert constraints into appropriate data structure (function by Matt J https://www.mathworks.com/matlabcentral/fileexchange/74481-prob2matrices-a-selective-version-of-prob2struct)
%%Optimization
fun=@(unknowns)CRMPobj(unknowns, idx, injection, production, time, Gamma); %this step creates the function to be optimized
options= optimoptions('fmincon', 'MaxFunctionEvaluations', 10000); %enforce max function evaluations (default is 3000)    
[unknowns, obj]=fmincon(fun,p.x0,p.Aineq,p.bineq,[],[],p.lb,p.ub,[],options); %this step solves the optimization 

The problem solves correctly and quickly for a simple example problem of ni=5, np=4, and nt=200. However, when moving to a real data set of ni=182, np=300, and nt=25, I fail to see any meaningful progression in the solution up to the maximum number of timesteps is reached (I am working on having MATLAB installed on my company's server, so RAM won't soon be an issue). I have a couple of remarks and guiding questions that may be useful in reaching a solution to this problem (forgive my lack of formal NLP knowledge):

Before running out of memory, I am able to run fmincon() for approximately 20000 iterations. In this time, the solution obtained for f_ij and tau_j is barely (if at all) changed from the initial guesses, and the objective function, accordingly, doesn't change much. Is this lack of change likely (1) because the problem requires many more iterations to converge to a correct solution (hundreds of thousands, millions? does this become a concern of computational time?) or (2) is the function likely getting stuck in a local minimum? If the latter;
Is fmincon() the best way to solve this problem? Are there other local optimizers that are more useful or is a particular global optimizer recommended?
In terms of converging upon a solution, are the number of constraints an issue, or is it likely the number of variables being optimized? I am optimizing ni*np+np variables for ni constraints.

All in all, I am looking to better understand if my limitation in solving this problem for a real data set is due (1) to computation time and memory limits AND/OR (2) due to the nature of the way I have formulated this problem in MATLAB. Any insights into understanding the nature of this problem and how I may reach a solution is much appreciated.

2 Comments
Show None Hide None

Matt J on 13 Jul 2021

Edited: Matt J on 13 Jul 2021

Before running out of memory, I am able to run fmincon() for approximately 20000 iterations

It may be good to see what optimoptions you are actually using. With the code you have posted (MaxFunctionEvaluations = 10000) , fmincon could not possibly run for 20000 iterations.

Matt J on 22 Jul 2021

What are the initial conditions on the recursion that defines qjk? What is q_j(k-1) when k=1?

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 13 Jul 2021

Edited: Matt J on 13 Jul 2021

0 votes

Before running out of memory, I am able to run fmincon() for approximately 20000 iterations.

It is concerning that you are running out of memory. The amount of consumed memory should normally not increase as the optimization progresses.

In this time, the solution obtained for f_ij and tau_j is barely (if at all) changed from the initial guesses, and the objective function, accordingly, doesn't change much.

It might be worth checking the gradients and the Hessian to see if they have vanishingly small values.

Is fmincon() the best way to solve this problem?

fmincon is the only local optimizer that will solve a problem of this form. A global optimizer might be worth a try (e.g., ga()) if an educated intial guess is hard to derive.

In terms of converging upon a solution, are the number of constraints an issue, or is it likely the number of variables being optimized? I am optimizing ni*np+np variables for ni constraints.

My intuition says that it is a case of vanishing gradients. You'll recall that I had concerns about formulating the optimization in terms of tau, where the function has some very flat areas, as opposed to theta=exp(-delta_t/tau). The increase in the problem dimension may be exacerbating the problem.

17 Comments
Show 15 older comments Hide 15 older comments

Corey Hoydic on 13 Jul 2021

Open in MATLAB Online

Hello again, Matt! Good to hear from you. Let me try to address some of your points below:

It may be good to see what optimoptions you are actually using. With the code you have posted (MaxFunctionEvaluations = 10000) , fmincon could not possibly run for 20000 iterations.

I should specify that, originally, I received the error "Requested 54900x54900 (22.5GB) array exceeds maximum array size preference (15.8GB)." from the command eye(np*ni) executed within fmincon(). I was able to get around this by unchecking "Limit the maximum array size to a percentage of RAM" in my preferences. This was able to allow me to avoid the abovementioned error and the subsequent crash, up to about 20000 iterations (30000 gave an "Out of Memory" error). My options are as follows:

fmincon options:
   Options used by current Algorithm ('interior-point'):
   (Other available algorithms: 'active-set', 'sqp', 'sqp-legacy', 'trust-region-reflective')
   Set properties:
       MaxFunctionEvaluations: 10000
   Default properties:
                    Algorithm: 'interior-point'
           BarrierParamUpdate: 'monotone'
               CheckGradients: 0
          ConstraintTolerance: 1.0000e-06
                      Display: 'final'
     FiniteDifferenceStepSize: 'sqrt(eps)'
         FiniteDifferenceType: 'forward'
         HessianApproximation: 'bfgs'
                   HessianFcn: []
           HessianMultiplyFcn: []
                  HonorBounds: 1
                MaxIterations: 1000
               ObjectiveLimit: -1.0000e+20
          OptimalityTolerance: 1.0000e-06
                    OutputFcn: []
                      PlotFcn: []
                 ScaleProblem: 0
    SpecifyConstraintGradient: 0
     SpecifyObjectiveGradient: 0
                StepTolerance: 1.0000e-10
          SubproblemAlgorithm: 'factorization'
                     TypicalX: 'ones(numberOfVariables,1)'
                  UseParallel: 0

It may be good to see what optimoptions you are actually using. With the code you have posted (MaxFunctionEvaluations = 10000) , fmincon could not possibly run for 20000 iterations.

See the text above. I tested running 10000 and 20000 iterations (changed it back to 10000 in the code it would appear). More iterations would have led to an "out of memory" error. The results for 10000 and 20000 iterations were both similar to each other and relatively unchanged from the initial guesses.

It might be worth checking the gradients and the Hessian to see if they have vanishingly small values.

After timestep 10000, the Hessian appears to be, strangely, the identity matrix. It is exactly the identity matrix -- ones on the main diagonal and exactly zero elsewhere. What exactly does this indicate? Is this the case of vanishing gradients that you are mentioning?

fmincon is the only local optimizer that will solve a problem of this form. A global optimizer might be worth a try (e.g., ga()) if an educated intial guess is hard to derive.

I may look into the Global Optimization Toolbox if all else fails. Hopefully the cost will be approved if it is needed. The implementation looks relatively similar to fmincon(). I have implemented the algorithms for initial guesses as suggested in literature, but it is possible that they may not be applicable to such a large problem.

My intuition says that it is a case of vanishing gradients. You'll recall that I had concerns about formulating the optimization in terms of tau, where the function has some very flat areas, as opposed to theta=exp(-delta_t/tau). The increase in the problem dimension may be exacerbating the problem.

I may look into running this formulation once my company gets MATLAB up on our server. I will have to do some data pre-processing to ensure that delta_t is the same for all timesteps to ensure that the tau back calculated at the end is correct.

If the Hessian is in fact the problem, what are your recommendations to finding a solution? I will try the theta formulation this afternoon or tomorrow morning (backed up with a few other things!) and perhaps look into global optimization if we cannot get there locally.

Corey Hoydic on 22 Jul 2021

Hello Matt,

My apologies for now just getting back to you. Some comments on the last week of progress:

I did indeed implement the theta formulation theta=exp(-delta_t/tau) for my problem. Based on benchmarking on smaller problems that do converge, the speed saved (to convergence) is approximately 50%. However, for the larger problem that I am currently running, I am on the 4th day of running (approximately 40 million iterations of MaxFunctionEvaluations), so we are talking days (at a minimum) to reach convergence.

The problem I am optimizing has np=300 and ni=15, so we are talking np*ni+np=4800 variables to optimize here. The problem is showing to be quite slow and perhaps not solvable in a reasonable amount of time.

Do you have any knowledge as to what options I may be able to tweak within fmincon() to speed up convergence? I suppose the major question I am asking is "how much can convergence speed be tweaked by changing the options of fmincon() and how much is unresolvable due to the nature (size) of the problem itself?"

Let me also address your previous comment You should probably be using one of the fmincon option configurations that don't require you to compute Hessian explicitly, e.g., HessianMultiplyFcn. Computing a Hessian is only practical in low dimensional problems. Currently, I have HessianMultiplyFcn set to [], and the algorithm fmincon() is using is interior point with HessianApproximation set to "bfgs." Would you recommend that I use 'lbfgs' or 'finite-difference' for HessianApproximation instead, since these do not return a Hessian?

Thanks again for your help.

Corey Hoydic on 22 Jul 2021

Open in MATLAB Online

I should have specified -- I constrained the problem a bit (from np=300 and ni=182 to np=300 and ni=15) and now the problem does progress. I am assessing convergence by meeting the criteria "fmincon stopped because the size of the current step is less than the value of the step size tolerance and constraints are satisfied to within the value of the constraint tolerance." For a smaller problem, say, np=30 and ni=15, this converges in about 10 minutes. For np=300 and ni=15, I have been running the problem for about 4 days now and have not reached convergence or even a solution that makes physical sense. The options I am using are as follows:

  fmincon options:
   Options used by current Algorithm ('interior-point'):
   (Other available algorithms: 'active-set', 'sqp', 'sqp-legacy', 'trust-region-reflective')
   Set properties:
       MaxFunctionEvaluations: 10000000
                MaxIterations: 1000000
   Default properties:
                    Algorithm: 'interior-point'
           BarrierParamUpdate: 'monotone'
               CheckGradients: 0
          ConstraintTolerance: 1.0000e-06
                      Display: 'final'
     FiniteDifferenceStepSize: 'sqrt(eps)'
         FiniteDifferenceType: 'forward'
         HessianApproximation: 'bfgs'
                   HessianFcn: []
           HessianMultiplyFcn: []
                  HonorBounds: 1
               ObjectiveLimit: -1.0000e+20
          OptimalityTolerance: 1.0000e-06
                    OutputFcn: []
                      PlotFcn: []
                 ScaleProblem: 0
    SpecifyConstraintGradient: 0
     SpecifyObjectiveGradient: 0
                StepTolerance: 1.0000e-10
          SubproblemAlgorithm: 'factorization'
                     TypicalX: 'ones(numberOfVariables,1)'
                  UseParallel: 0

Corey Hoydic on 22 Jul 2021

Edited: Corey Hoydic on 22 Jul 2021

Inverse distance means that initial guesses for fij are weighted higher for i and j that are closer to each other (we are talking oil and gas wells, so it is a good first assumption to assume closer wells are more connected).

Using lsqlin to derive initial guesses sounds like a good idea. I must say that I am unsure how to formulate the objective function in terms of "C" and "d" (my fmincon is formulated as fmincon(fun,p.x0,p.Aineq,p.bineq,[],[],p.lb,p.ub,[],options)

I currently have a run going, but I will share with you the 'display', 'iter' output for the next run, likely tomorrow or Saturday (I am running 10^7 MaxFunctionEvaluations per day).

I may also look into how much the parallel computing toolbox will speed up this problem. I do not have access to this toolbox at work so I will have to try this problem at home.

I am also looking into how much the parallel computing toolbox may speed up this problem. I do not have access to this toolbox at work so I will have to test at home.

Matt J on 23 Jul 2021

Edited: Matt J on 23 Jul 2021

Open in MATLAB Online

For example, the Jacobian of q with respect to f.' will be,

Jf=kron((I*M1(tau1_)*M2(tau_2)*M3(tau_3)*...*M(tau_n)).' , speye(np))

and the gradient of z with respect to f will be,

dzdf(:)= 2*I*M1(tau1_)*M2(tau_2)*M3(tau_3)*...*M(tau_n)*(qobj-q).'

Corey Hoydic on 23 Jul 2021

Edited: Corey Hoydic on 24 Jul 2021

I tried parallelizing last night (6 agents) and the run was actually slower, so I would imagine that supplying the gradient is my best bet here to speed it up. I've got a few questions:

Can you give some more info about the form of the sparse upper triangular matrix? I am not sure what a sparse upper triangular matrix that is a function of tau should look like.
What should the size of the gradient be? Recall that my unknowns are f, which is np*ni, and tau, which is np, for a total of np*ni+np unknowns. Should the gradient not be with respect to each of these unknowns, not just f?
qobj and q are np by nt -- does any summation need to occur across time? I'm not sure if the matrix multiplication is adding up otherwise. Hope it's not asking too much, but laying out the dimensionality of each matrix would be extremely helpful.

Apologies for the basic questions. You can tell this is not my area of expertise.

Sign in to comment.

Slow/Non-Convergence for Large Nonlinear Programming Problem -- Better Way to Solve/Formulate?

2 Comments
Show None Hide None

Answers (1)

17 Comments
Show 15 older comments Hide 15 older comments

Categories

Products

Release

Tags

Community Treasure Hunt

Slow/Non-Convergence for Large Nonlinear Programming Problem -- Better Way to Solve/Formulate?

2 Comments Show None Hide None

Answers (1)

17 Comments Show 15 older comments Hide 15 older comments

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

2 Comments
Show None Hide None

17 Comments
Show 15 older comments Hide 15 older comments