Can I use automatic differentiation in fmincon?

14 views (last 30 days)
Michael Sikivie on 10 Sep 2017
Answered: Steve Grikschat on 8 Oct 2020
I have a likelihood function that I'm solving in Matlab but it takes several hours to maximize using fmincon. I wondered if the package autodiff, or some Matlab package for automatic differentiation, could speed it up. If so, how can I use it with fmincon or some other constrained optimization command? There doesn't seem to be an automatic differentiation option in the fmincon documentation.

Alan Weiss on 11 Sep 2017
I doubt that you would get any significant speedup using automatic differentiation if you use it simply to specify the gradient of the objective. fmincon usually does a good job estimating gradients by finite differences, and usually this is not very time-consuming. However, if you would compute the Hessian of the Lagrangian using automatic differentiation, then you might get some speedup. See this example using symbolic computations of gradients and Hessians, where the result took only 19 iterations using an analytic Hessian, but 79 iterations using finite differences.
Yes, symbolic computations are not exactly the same as AD, but after using matlabFunction they are quite comparable.
Alan Weiss
MATLAB mathematical toolbox documentation
Alan Weiss on 18 Sep 2017
Well, if your objective function has analytic derivatives, and does not include any integration, then generally it doesn't take all that long to evaluate. The length of time to evaluate the analytic expression for the gradient is quite often longer than the length of time to evaluate the objective a few times when approximating a gradient. Generally, if you have an N-dimensional variable, then fmincon uses N objective function evaluations to approximate a gradient using forward finite differences.
Feel free to use tic and toc statements on the example I linked to see that analytic gradients often don't save time. However, the analytic Hessian certainly saves time, mainly by substantially lowering the number of iterations.
Alan Weiss
MATLAB mathematical toolbox documentation

Shaun Forth on 11 Sep 2017
Hi Michael,
To the best of my knowledge the Matlab optimization toolbox still does not have automatic differentiation.
Some key questions are:
• How many inputs N do you have and how many constraints Mc? As you have a constrained problem (objective + constraints) you will have M=Mc+1 functions required to be differentiated. If M<<N then reverse (adjoint) mode AD would be preferred.
• Is your code vectorized? The effect of vectorization in your code may be huge. If you have a large loop then for overloaded AD each operation in the loop has to be overloaded and functions called each cycle of the loop. The overhead of calling the functions may be substantial. If that same loop were instead vectorized then just one function is called for each operation. For reverse mode, even source transformation AD tools are likely to store data to a so-called tape (a.k.a. stack) in order to perform the reverse pass through the code to compute derivatives. Loops will result in many calls of a function to store data, vectorized functions result in many fewer calls (though with more data stored each call).
For Matlab the operator-overloaded AD tools are relatively robust, my own tool MAD has been used for many years but only supports forward mode. It is also integrated within the TOMLAB optimization library. The source transformation tools may require more interaction with the developer to get the coverage of all the Matlab functions used in your code.
A list of Matlab AD tools may be found at http://www.autodiff.org/?module=Tools&language=MATLAB
Regards Shaun
Michael Sikivie on 11 Sep 2017
Regarding inputs and constraints:At present there are 31 parameters to optimize over and 2 constraints, an equality constraint and an inequality constraint. This doesn't count bounds on individual parameters, should I count those as constraints? My ambition is to estimate a probability weighted mixture of 2 or 3 specifications of the model, so possibly 60-90 parameters and 4-6 constraints.
As far as vectorization, the objective to maximize is definitely expressed as a product of matrices, if that's what you mean. I'd say maybe half or more of the work is elementwise operations on matrices, and the rest is multiplying or dividing those matrices by each other. But most of the time passes within a nested loop because each observation is reduced along several different dimensions to come up with a single probability, and it was hard to do that with 2-D matrix algebra alone. Let me know if I need to explain what I mean by that.

Steve Grikschat on 8 Oct 2020
Updated for R2020b
Optimization Toolbox now includes automatic differentiation for problems built with the problem-based workflow. Nonlinear constrained and unconstrained problems now solve with fmincon and fminunc using automatic differentiation.
This blog post by Alan Weiss explains it excellently