Advice/tips for numerical optimization

Behram Wali

Join Date: Mar 2016

Posts: 50
#1

Advice/tips for numerical optimization

07 Feb 2025, 21:57

Hi All:

I am dealing with a hard-to-maximize likelihood function. My experience thus far is that cycling between multiple optimization algorithms via ml's technique () option is very useful to find the peaks of difficult-to-maximize functions. I have used technique (bhhh 4 nr 5), technique(nr 5 dfp 5), and so on. Doing so achieves convergence, but with large (70-90) number of iterations and the maximization is quite slow.

Following the excellent advice in Gould et al. MLE in Stata (4th Edition), examining the trace reveals that the Newton Raphson (NR) steps stay in the problematic region of the likelihood for most of the simulation although NR escapes it and achieves stability towards the end when convergence is achieved. In contrast, the Davidon–Fletcher–Powell (DFP) steps are (expectedly) much faster and find the function peak in fewer iterations, except for backing up in the first few iterations.

I am interested in knowing your thoughts on using DFP instead of NR in this scenario. We have around 50k observations and the calculation of Hessian is highly expensive as simulation is involved in the maximization. The assumption of a random sample seems reasonable in this particular application, so I guess I am very inclined towards using an empirical OPG variance estimator but I was curious if I might be missing something here?

In case it helps, I am trying to maximize a joint discrete-continuous choice likelihood, with integrals involved in the likelihood as well as in computing nuisance heterogeneity parameters, therefore I am using maximum simulated likelihood in the likelihood evaluator program.

Thanks much for any suggestions!
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10260

08 Feb 2025, 12:58

What works, works. So if Davidon-Fletcher-Powell (DFP) does the job to your satisfaction, go for it. Another maximization method that works well for difficult integrands is adaptive quadrature. However, these methods don’t have to be mutually exclusive.

You can start with DFP or adaptive quadrature, and once you achieve convergence, switch to Newton-Raphson for faster refinement. This hybrid approach is used in practice; for example, the gllamm command from SSC follows a similar strategy (see below).

Code:

webuse tvsfpors, clear
gen cctv= c.cc#c.tv
gllamm thk prethk cc tv cctv, i(school) family(binomial) link(ologit) adapt

Res.:

Code:

. gllamm thk prethk cc tv cctv, i(school) family(binomial) link(ologit) adapt

Running adaptive quadrature
Iteration 0:    log likelihood = -2123.8577
Iteration 1:    log likelihood = -2120.0494
Iteration 2:    log likelihood = -2119.7702
Iteration 3:    log likelihood = -2119.7605
Iteration 4:    log likelihood = -2119.7506
Iteration 5:    log likelihood = -2119.7444
Iteration 6:    log likelihood = -2119.7442


Adaptive quadrature has converged, running Newton-Raphson
Iteration 0:  Log likelihood = -2119.7442  
Iteration 1:  Log likelihood = -2119.7442  (backed up)
Iteration 2:  Log likelihood = -2119.7428  
Iteration 3:  Log likelihood = -2119.7428  
 
number of level 1 units = 1600
number of level 2 units = 28
 
Condition Number = 16.579687
 
gllamm model 
 
log likelihood = -2119.7428
 
------------------------------------------------------------------------------
         thk | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
thk          |
      prethk |   .4032892     .03886    10.38   0.000      .327125    .4794533
          cc |   .9237908   .2040746     4.53   0.000     .5238119     1.32377
          tv |   .2749939   .1977431     1.39   0.164    -.1125754    .6625633
        cctv |  -.4659264   .2845972    -1.64   0.102    -1.023727     .091874
-------------+----------------------------------------------------------------
_cut11       |
       _cons |  -.0884495   .1641067    -0.54   0.590    -.4100927    .2331937
-------------+----------------------------------------------------------------
_cut12       |
       _cons |   1.153364   .1656165     6.96   0.000     .8287615    1.477966
-------------+----------------------------------------------------------------
_cut13       |
       _cons |    2.33195   .1734203    13.45   0.000     1.992052    2.671847
------------------------------------------------------------------------------
 
 
Variances and covariances of random effects
------------------------------------------------------------------------------

 
***level 2 (school)
 
    var(1): .07351208 (.03831112)
------------------------------------------------------------------------------

Last edited by Andrew Musau; 08 Feb 2025, 13:06.

Comment

Behram Wali

Join Date: Mar 2016

Posts: 50
#3

12 Feb 2025, 01:56

Many thanks for your response! DFP worked with my likelihood this time. We have a 7-dimensional rectangular integral, so (adaptive) quadrature was not an option unfortunately. As a side note, looking at your output, it appears gllamm uses adaptive quadrature as an optimizer, beyond its typical use for numerical integration, e.g., as in Mata's Quadrature() function. I'll look into this further. Thanks.
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10260
#4

12 Feb 2025, 08:31

Originally posted by Behram Wali View Post

looking at your output, it appears gllamm uses adaptive quadrature as an optimizer

Indeed, the idea is to use adaptive quadrature to navigate the problematic regions of the likelihood function where Newton-Raphson struggles. Once you move closer to the maximum, where the likelihood surface is more well-behaved, you can switch to Newton-Raphson, as it is typically more efficient in that region. To implement this strategy, you would generally set the tolerance level for adaptive quadrature not too conservatively, allowing for a smoother transition to Newton-Raphson.

In your case, you could experiment with using DFP as an optimizer, as it can provide a good balance between convergence stability and computational efficiency.
Comment

Announcement