clarification about first order optimality

Question

1 vote

I have been using Matlab in my design optimization course, and I wanted to ask a bit about the output.firstorderopt values generated by fmincon.

Obviously, this needs to be a small value for a solution, and I see where this value corresponds to the infinity norm of the vector sum that represents the gradient of the Lagrangian function. This corresponds to the "first" Karush Kuhn Tucker condition here: https://www.mathworks.com/help/optim/ug/first-order-optimality-measure.html (eqn 2).

I had a student ask me a question about this, and after a little exploration, I found for a simple test problem I can calculate the gradient of the Lagrangian at the solution to a constrained minimization problem and it is different from the value in output.firstorderopt.

Here is the simple example problem:

minimize

subject to g_1(x) = 1 - 2 * x(1) * x(2) / 3 <= 0

I set this up to solve using SQP via fmincon, starting from the initial design x0 = [2; 4]. Here is the objective function script:

function [f, grad_f] = class24_fun(x)
% in-class example objective function 
f = x(1)^4 + x(2)^2 - x(1)^2 * x(2);
if nargout > 1
    grad_f = [4*x(1)^3 - 2*x(1)*x(2);
        2*x(2) - x(1)^2];
end
end

Here is the nonlinear constraint script:

function [g, h, grad_g, grad_h] = ...
    class24_con(x)
% in-class example nonlinear constraints
% inequality constraints
g(1) = 1 - 2 * x(1) * x(2) / 3;
g(2) = (3 * x(1)^2 - 4 * x(2)) / 3 + 1;
% equality constraints
h = []; % none in this problem
% compute gradients
if nargout > 2
    % gradients of inequality; each column in 
    % array is one gradient vector
    grad_g = [-2 * x(2)/3, 2 * x(1);
        -2 * x(1) / 3, -4/3];
    % equality 
    grad_h = []; % no equality constraints
end
end

Here is how I solved the problem and then computed my own version of the first-order optimality measure. I left off several semi-colons to show the results.

>> x0 = [2; 4];
>> options = optimoptions('fmincon','Algorithm','sqp');
>> [xstar,fstar,exitflag,output,lambda,grad,hessian] = fmincon(@class24_fun,x0,...
[],[],[],[],[],[],@class24_con,options)
Local minimum found that satisfies the constraints.
Optimization completed because the objective function is non-decreasing in 
feasible directions, to within the value of the optimality tolerance,
and constraints are satisfied to within the value of the constraint tolerance.
<stopping criteria details>
xstar =
    1.0000
    1.5000
fstar =
    1.7500
exitflag =
     1
output = 
  struct with fields:
         iterations: 6
          funcCount: 26
          algorithm: 'sqp'
            message: '↵Local minimum found that satisfies the constraints.↵↵Optimization completed because the objective function is non-decreasing in ↵feasible directions, to within the value of the optimality tolerance,↵and constraints are satisfied to within the value of the constraint tolerance.↵↵<stopping criteria details>↵↵Optimization completed: The relative first-order optimality measure, 2.244896e-08,↵is less than options.OptimalityTolerance = 1.000000e-06, and the relative maximum constraint↵violation, 1.443290e-15, is less than options.ConstraintTolerance = 1.000000e-06.↵↵'
    constrviolation: 1.4433e-15
           stepsize: 7.3233e-08
       lssteplength: 1
      firstorderopt: 4.4898e-08
lambda = 
  struct with fields:
         eqlin: [0×1 double]
      eqnonlin: [0×1 double]
       ineqlin: [0×1 double]
    ineqnonlin: [2×1 double]
         lower: [2×1 double]
         upper: [2×1 double]
grad =
    1.0000
    2.0000
hessian =
   10.0144   -3.2948
   -3.2948    2.1570
>> [fstar,grad_fstar] = class24_fun(xstar)
fstar =
    1.7500
grad_fstar =
    1.0000
    2.0000
>> [gstar,hstar,grad_gstar,grad_hstar] = class24_con(xstar)
gstar =
   1.0e-14 *
    0.0777    0.1443
hstar =
     []
grad_gstar =
   -1.0000    2.0000
   -0.6667   -1.3333
grad_hstar =
     []
>> grad_L = grad_fstar + lambda.ineqnonlin(1) * grad_gstar(:,1) + lambda.ineqnonlin(2) * grad_gstar(:,2)
grad_L =
   1.0e-07 *
   -0.2588
   -0.1047
>> myfirstorderopt = norm(grad_L,Inf)
myfirstorderopt =
   2.5883e-08
   
>> output.firstorderopt
ans =
   4.4898e-08
   

Both numbers are very small, and I know that the xstar values are the minimum of the function, but is the difference between myfirstorderopt and the output.firstorderopt simply a result of truncation errors, or am I missing some other feature of how fmincon is computing this first order optimality measure?

0 Comments
Show -2 older comments Hide -2 older comments

Sign in to comment.

Sign in to answer this question.

Follow Question

Answer 1

Matt J on 4 Dec 2020

Edited: Matt J on 4 Dec 2020

Open in MATLAB Online

0 votes

It is because you have not invoked the options SpecifyObjectiveGradient=true and SpecifyConstraintGradient=true in your call to fmincon. Your calculation of firstorderopt is based on analytical gradient calculations, while fmincon is using finite difference approximations. We see much better agreement when these options are activated:

options = optimoptions('fmincon','Algorithm','sqp','Display','none',...
            'SpecifyObjectiveGradient',true,'SpecifyConstraintGradient',true);
[xstar,fstar,exitflag,output,lambda,grad,hessian] = ...
    fmincon(@class24_fun,[2; 4],[],[],[],[],[],[],@class24_con,options);
[fstar,grad_fstar] = class24_fun(xstar);
[gstar,hstar,grad_gstar,grad_hstar] = class24_con(xstar);
grad_L = grad_fstar + grad_gstar*lambda.ineqnonlin;
myFOOM = norm(grad_L,inf)
myFOOM = 1.0004e-08
tmwFOOM=output.firstorderopt
tmwFOOM = 1.0004e-08

There are some residual differences in the two calculations, which I think can be attributed to floating point discrepancies:

abs(myFOOM-tmwFOOM)
ans = 7.4015e-17
function [f, grad_f] = class24_fun(x)
% in-class example objective function 
    f = x(1)^4 + x(2)^2 - x(1)^2 * x(2);
    if nargout > 1
        grad_f = [4*x(1)^3 - 2*x(1)*x(2);
            2*x(2) - x(1)^2];
    end
end
function [g, h, grad_g, grad_h] = ...
    class24_con(x)
% in-class example nonlinear constraints
    % inequality constraints
    g(1) = 1 - 2 * x(1) * x(2) / 3;
    g(2) = (3 * x(1)^2 - 4 * x(2)) / 3 + 1;
    % equality constraints
    h = []; % none in this problem
    % compute gradients
    if nargout > 2
        % gradients of inequality; each column in 
        % array is one gradient vector
        grad_g = [-2 * x(2)/3, 2 * x(1);
            -2 * x(1) / 3, -4/3];
        % equality 
        grad_h = []; % no equality constraints
    end
end

1 Comment
Show -1 older comments Hide -1 older comments

William Crossley on 4 Dec 2020

Thank you. Good example of needing to pay attention to all of the features in the problem set up.

Sign in to comment.

clarification about first order optimality

0 Comments
Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment
Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

Community Treasure Hunt

clarification about first order optimality

0 Comments Show -2 older comments Hide -2 older comments

Accepted Answer

1 Comment Show -1 older comments Hide -1 older comments

More Answers (0)

Categories

Products

Release

Tags

See Also

Community Treasure Hunt

0 Comments
Show -2 older comments Hide -2 older comments

1 Comment
Show -1 older comments Hide -1 older comments