Speeding up computation time of -asmixlogit-/-cmmixlogit-

Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#16

16 Feb 2022, 07:04

Hi everyone, here is a more substantial update. I have been running models for more than two weeks now and tried several alternative options. Though none of the models achieved convergence, I believe I made significant steps forward throughout the various attempts. Here are some of the things I found:

- The default maximization -technique- (nr) doesn't work well with my models. Even when setting a low number of integration points (e.g., 50), it iterates extremely slowly. In 120 hours (which is the maximum run time in the cluster I am using) it just completes one iteration. Conversely, as suggested by Hong Il Yoo, -bhhh- and -bfgs- proceed must faster: even with the default number of integration points, they do tens of iterations. They have not converged yet, but more on this below.

- As expected, using the default number of integration points results in a much more effective maximization as compared to using 50 or 100 integration points. Therefore, it might be better to choose a faster -technique- and avoid setting a low number of points. (Please let me know in case you disagree with this statement.)

- Regardless of the -technique- used, after about 80 iterations, improvements in the likelihood function are extremely small.

Based on these findings, there are a couple of additional questions I would like to ask:

1. As mentioned by Joerg Luedicke (StataCorp), it is possible to feed the starting matrix of coefficients to -cmmixlogit- to facilitate the maximization. This thread suggests that -cmmixlogit- accepts the starting matrix even from different estimators such as -mixlogit- having only random coefficients. If true, this might be really helpful as our last models before switching to -cmmixlogit- were estimated successfully using -mixlogit-. Can anyone confirm whether it is indeed possible to use -mixlogit- coefficient matrices (only random coeffs) in -cmmixlogit- models?

2. Since our models didn't achieve convergence, I thought of increasing the -tolerance- a bit. In particular, I set a slightly higher -ltolerance(0.0001)- for the likelihood function. However, this parameter was ignored. Do you have any idea why? Is this perhaps due to the fact that the function was -not concave- or -backed up-?

3. Only once, when using -technique(bhhh)- after 300 iterations (the default maximum number), did Stata produce some results. Since the model had not effectively converged, Stata issued an error message (-r(430)-) and failed to execute the following -estimates save- instruction (this is normal behavior of course when encountering an error). Do you think that by -capture-ing a model that achieves the maximum number of iterations without converging I would be able to save the estimates and use them later on (e.g. by -esttab-ing them)? Also, since after about 80-100 iterations the likelihood function does not increase substantially anymore, do you find the idea of setting a lower -iterate(#)- acceptable?

4. Until now I assumed that changing -intmethod- to Hanlon would not produce great effects. From experience do you think this assumption might be incorrect?

Thanks so much again for any idea you are able to share.

Last edited by Riccardo Valboni; 16 Feb 2022, 07:06.
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 116
#17

16 Feb 2022, 11:54

Riccardo, from what you describe, it sounds to me that your model is just not well empirically identified, or at least it is very weakly identified at best. One indication of this is that you are seeing the optimizer iterating with little change in the values of the log likelihood, indicating a "flat" likelihood surface. If you were to use the trace option, which shows the parameter vector at each iteration, I suspect that you would see rather large changes in the parameters (mainly in the variance parameters, I suppose) relative to the changes in the log likelihood, which would be a clear sign of non-identification. How many cases, i.e. choice sets (not observations) do you actually have, how many alternatives, do the alternatives vary across choice sets, how many alternative-specific variables are in the model, and how many of these are to have random coefficients? Also, have you tried to fit the model with fixed coefficients only? If that succeeds perhaps build from there and add one random coefficient at a time. That being said, sometimes convergence difficulties can arise if one or more of the variance parameters are very close to zero, in which case the undocumented option scalemetric(unconstrained) may help the model to converge.
2 likes
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#18

23 Feb 2022, 12:24

Hi Joerg, many thanks for your further thoughts; some of the things you mentioned might be very helpful. The data structure seems fine: we have 354 alternatives (i.e., possible locations for each foreign investment) and 1270 investments for a total of 449,580 observations; alternatives do not vary across investments. Moreover, we have 15 alternative-specific variables, all set as random coefficients, and 3 case-specific variables (i.e., variables referring to the companies making the investments).

For sure we are encountering a flat region: Stata mentions this multiple times across iterations. Perhaps what is causing the problem is the fact that in case-specific variables there are a number of missing values (about 20-30%). Therefore the matrix of case-specific variables is not of the same dimension as that of alternative-specific variables. I thought this wouldn't be a problem because Stata in that case would subset the matrix of alternative-specific regressors as needed, but perhaps it generates internal issues; I am not sure whether this is the case, but it might be. I can try to subset the dataset myself using -if- to make sure both groups of variables are entirely populated with non-missing values and see whether this solves the problem.

Moreover, I can specify the -scalemetric(unconstrained)- option, as you suggested.

Furthermore, if these measures have no effect, I can start by specifying all alternative-specifc variables as fixed instead of random, and switch one by one to random...even though, when we estimated the model using a more simple -mixlogit-, specifying all random coefficients didn't cause any problem. Do you think therefore this might be an issue when estimating the -cmmixlogit-?

Thanks again for the advice.
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 116
#19

24 Feb 2022, 12:14

Missing values in the case-specific variables won't cause a problem in the sense that Stata will discard incomplete cases prior to fitting the model. When you used mixlogit, did you specify to include alternative-specific constants? By default, mixlogit does not do this, whereas cmxtmixlogit includes ASCs by default. With 354 alternatives, that's 353 constants which is a lot. Also, did you include case-specific variables when you used mixlogit? Since for case-specific variables we estimate alternative-specific coefficients, that's another 353 estimated parameters per variable. So, just accounting for ASCs and case-specific variables, that's 1,412 parameters to be estimated (in the best case where the variables are either continuous or at most binary categorical). Adding to that 15 random coefficients and data with that many alternatives, I am not surprised by the convergence issues. You could consider whether it makes sense to not include ASCs (by using option noconstant) and case-specific variables, at least that might be a good starting point.
1 like
Comment
Ben Jarvis

Join Date: Aug 2014

Posts: 11
#20

24 Feb 2022, 13:16

How much variation is there in your alternative specific covariates *within* alternatives? If you have ASCs in your model, but very little variation in the covariates within alternatives, then you're asking the model to do a lot of work, and even clogit might have trouble coming up with parameter estimates. And then asking to estimate mean and variance parameters for the alternative specific variable coefficients in the presence of ASCs and ASC-by-firm covariate interactions is just a bridge too far.
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#21

11 Apr 2022, 01:00

Thanks so much Joerg Luedicke (StataCorp) and Ben Jarvis for the help. I have been running models for a few weeks now and it turns out your advice helped a great deal: when switching all alternative-specific variables to fixed and suppressing case-specific constants, the model finally converged. Since some variables have a small variance, I also used the scalemetric(unconstrained) option, together with difficult. Thanks again for these suggestions.

I would also have a last question for which I cannot find a clear answer in the Stata help guides. Is it possible to create interaction effects between alternative-specific variables and case-specific ones? If yes, where should one enter such interactions, in the initial part of the regression call where the fixed effects are specified? If that is not possible, is it perhaps possible to draw margins plots where one manipulates not only the level of alternative-specific variables but also that of case-specific ones? (I hope the question is clear—please let me know if you want me to create an ad-hoc thread for this question)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment