Hi All:
I am dealing with a hard-to-maximize likelihood function. My experience thus far is that cycling between multiple optimization algorithms via ml's technique () option is very useful to find the peaks of difficult-to-maximize functions. I have used technique (bhhh 4 nr 5), technique(nr 5 dfp 5), and so on. Doing so achieves convergence, but with large (70-90) number of iterations and the maximization is quite slow.
Following the excellent advice in Gould et al. MLE in Stata (4th Edition), examining the trace reveals that the Newton Raphson (NR) steps stay in the problematic region of the likelihood for most of the simulation although NR escapes it and achieves stability towards the end when convergence is achieved. In contrast, the Davidon–Fletcher–Powell (DFP) steps are (expectedly) much faster and find the function peak in fewer iterations, except for backing up in the first few iterations.
I am interested in knowing your thoughts on using DFP instead of NR in this scenario. We have around 50k observations and the calculation of Hessian is highly expensive as simulation is involved in the maximization. The assumption of a random sample seems reasonable in this particular application, so I guess I am very inclined towards using an empirical OPG variance estimator but I was curious if I might be missing something here?
In case it helps, I am trying to maximize a joint discrete-continuous choice likelihood, with integrals involved in the likelihood as well as in computing nuisance heterogeneity parameters, therefore I am using maximum simulated likelihood in the likelihood evaluator program.
Thanks much for any suggestions!
I am dealing with a hard-to-maximize likelihood function. My experience thus far is that cycling between multiple optimization algorithms via ml's technique () option is very useful to find the peaks of difficult-to-maximize functions. I have used technique (bhhh 4 nr 5), technique(nr 5 dfp 5), and so on. Doing so achieves convergence, but with large (70-90) number of iterations and the maximization is quite slow.
Following the excellent advice in Gould et al. MLE in Stata (4th Edition), examining the trace reveals that the Newton Raphson (NR) steps stay in the problematic region of the likelihood for most of the simulation although NR escapes it and achieves stability towards the end when convergence is achieved. In contrast, the Davidon–Fletcher–Powell (DFP) steps are (expectedly) much faster and find the function peak in fewer iterations, except for backing up in the first few iterations.
I am interested in knowing your thoughts on using DFP instead of NR in this scenario. We have around 50k observations and the calculation of Hessian is highly expensive as simulation is involved in the maximization. The assumption of a random sample seems reasonable in this particular application, so I guess I am very inclined towards using an empirical OPG variance estimator but I was curious if I might be missing something here?
In case it helps, I am trying to maximize a joint discrete-continuous choice likelihood, with integrals involved in the likelihood as well as in computing nuisance heterogeneity parameters, therefore I am using maximum simulated likelihood in the likelihood evaluator program.
Thanks much for any suggestions!
Comment