Dear Stata Users,
I am wondering if anyone has a strategy to speed up the estimation of the cmxtmixlogit command?
I am estimating a set of choice models using a large panel dataset with approximately 60,000 cases and 200,000 observations. My preferred specification includes both fixed and random parameters, as well as a set of case-specific covariates.
Although I have also considered user-written commands such as mixlogit, I am using cmxtmixlogit due to its post-estimation capabilities with margins and its ease of specifying case-specific variables. I am aware that the cmxtmixlogit is not parallelized and hence using a multi-core computer with Stata MP (with e.g. 8 cores) or even a computing cluster would not improve the estimation speed. Due to the large sample size, it is also not feasible to set a small number of intpoints, as the models will not converge. The simulation procedure takes especially long when including additional (case-specific) covariates, such as week fixed effects (24 weeks x 5 alternatives) which amounts to a large number of covariates. So far, I have not been able to successfully estimate a model using all data and all covariates (and I have let the command run for up to a week).
I am hoping that there is something that I can tweak to be able to use the cmxtmixlogit command with my data. Any suggestions would be greatly appreciated.
Thanks,
Paul
I am wondering if anyone has a strategy to speed up the estimation of the cmxtmixlogit command?
I am estimating a set of choice models using a large panel dataset with approximately 60,000 cases and 200,000 observations. My preferred specification includes both fixed and random parameters, as well as a set of case-specific covariates.
Although I have also considered user-written commands such as mixlogit, I am using cmxtmixlogit due to its post-estimation capabilities with margins and its ease of specifying case-specific variables. I am aware that the cmxtmixlogit is not parallelized and hence using a multi-core computer with Stata MP (with e.g. 8 cores) or even a computing cluster would not improve the estimation speed. Due to the large sample size, it is also not feasible to set a small number of intpoints, as the models will not converge. The simulation procedure takes especially long when including additional (case-specific) covariates, such as week fixed effects (24 weeks x 5 alternatives) which amounts to a large number of covariates. So far, I have not been able to successfully estimate a model using all data and all covariates (and I have let the command run for up to a week).
I am hoping that there is something that I can tweak to be able to use the cmxtmixlogit command with my data. Any suggestions would be greatly appreciated.
Thanks,
Paul

I forgot to mention, another kludge you may try in the future is the like of -technique(bfgs 15 nr 5)- which asks Stata to use bfgs for 15 iterations and nr for 5 iterations; it sometimes helps the numerical solver to skip past flat regions of the log-likelihood function. There's nothing special about the numbers 15 and 5, you can experiment with different combinations until the dough feels right. The -(backed up)- messages are ok as long as the log-likelihood value actually does improve over iterations (no matter how marginally) and they don't pop up during the last few iterations before Stata declares convergence.
Comment