Speeding up computation time of -asmixlogit-/-cmmixlogit-

Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#1

Speeding up computation time of -asmixlogit-/-cmmixlogit-

24 Jan 2022, 10:27

I am working on a model that aims to analyze the determinants of the probability for a group of companies to position their new plants at a specific location within a list of possible locations, as a function of a set of location-specific characteristics. The sample contains 354 possible locations and 1270 investments in new plants for a total of 449.580 observations.

To estimate this model, so far we have used a mixed-logit model containing only variables with random coefficients. The regression call read something like this:

Code:

mixlogit location, group(investment_id) cluster(firm_id) id(firm_id) nrep(50) rand(loc_charact_1 loc_charact_2...loc_charact_15)

I ran this model on a supercomputer and it was successfully estimated in about an hour.

Things got complicated when we decided to add also case-specific variables, i.e. firm characteristics. At this point, we started using an alternative-specific mixed logit model (the command used to be -asmixlogit- but the Stata help file mentions it has been replaced by -cmmixlogit-, so I used the latter one). After realizing how hard it is to get these models to finish a single estimation, and building on the valuable insights contained in this post, I tried to run a trial model by using the command:

Code:

cmset investment_id location_id cmmixlogit location, random(loc_charact_1 loc_charact_2...loc_charact_15) casevars(firm_charact_1...firm_charact_3) favor(speed) intmethod(halton, antithetics)

However, I haven't been able to see the end of a single estimation. On the supercomputer we are using, the maximum run time is 120 hours, and during this time so far the model has not converged. Therefore, my questions are the following:

1. Is it correct that -asmixlogit- is equivalent to -cmmixlogit-?

2. In the linked post above, it is suggested that it's possible to speed up the computation by opting for a different integration method (-intmethod-), setting a lower number of integration points (-intpoints-), and using a different maximization algorithm (-technique-). What could be the best combination of these options? What could be for instance a lower, yet acceptable, number of integration points? And what could be a possible combination of maximization algorithms with respective iterations?

Any help is extremely appreciated.

Last edited by Riccardo Valboni; 24 Jan 2022, 10:31.
Tags: None
William Lisowski

Join Date: Dec 2014

Posts: 10150
#2

24 Jan 2022, 10:40

You may find this earlier post by Hong Il Yoo to a similar question useful.

https://www.statalist.org/forums/for...51#post1623351
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#3

24 Jan 2022, 11:34

Thanks so much for your reply.

I had seen Hong Il Yoo's post as well. We are already using Stata MP. The problem is that with -cmmixlogit- it offers limited parallelization (Stata MP uses 2 cores out of 16 and 10 Gb of memory out of 100). I don't think there is an option to force Stata to use more CPU and memory, but I might be wrong.

As for the other alternatives offered, we wouldn't know how to implement EM and MM algorithms that were suggested.

At the moment, we are looking for the set of options that is most likely to result in the fastest estimation. If this isn't enough, perhaps we will have to switch to another estimator.

Last edited by Riccardo Valboni; 24 Jan 2022, 11:57.
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#4

24 Jan 2022, 12:39

The problem is that with -cmmixlogit- it offers limited parallelization

Do you have a reference for this limitation, or is it an empirical finding based on your experience?

If the latter, are you sure your Stata MP is set up correctly? On a server to which I have access, my virtual desktop Stata installation shows

Code:

. display c(processors) 4 . display c(processors_lic) 4 . display c(processors_mach) 4 . display c(processors_max) 4 . about Stata/MP 17.0 for Windows (64-bit x86-64) Revision 17 Jan 2022 Copyright 1985-2021 StataCorp LLC Total physical memory: 16.44 GB Available physical memory: 13.64 GB Stata license: 83-user 4-core network, ...

I would expect you to have 10's where I have 4's. Unless you too are using a desktop client, in which case the clients may in fact be configured with less than the full complement of CPUs and memory, so that there's some hope of multiple clients accomplishing more than fighting for CPUs and memorhy.

Certainly the set processors command allows the control of the number of processors in use, up to the number licensed. And Stata uses as much memory as it can obtain; if it only used 10GB that was all it needed for the degree of parallelization implemented.
1 like
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#5

25 Jan 2022, 01:34

You are absolutely right. Mine was an empirical observation and it turned out we only had a license for using a maximum of 2 cores. I didn't think it worked that way: I thought a Stata MP license gave you access to the full power of Stata MP. I will check if I can have the license updated in a way to increase the number of cores. Do you know if more information is available on the maximum parallelization for -asmixlogit- or -cmmixlogit- and its performance as a function of the number of cores? I found this report which seems fairly up to date, but it doesn't contain any info on -asmixlogit- or -cmmlogit-, even though they seem to be parallelized to some extent.

I am still interested in hearing what might be a possible mix of options that could minimize the computation time, though increasing the number of cores certainly will help achieve the goal.
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#6

25 Jan 2022, 05:36

Riccardo Valboni: I haven't used either command before. My experience with self-written estimation programs for non-linear index models with random coefficients, however, suggests that using 2 cores reduces the estimation run time by approximately 50%; 4 cores by 67%; 8 cores by 76%; and 16 cores by 80%. Each percentage is defined relative to the estimation run time using 1 core, and the results are based on Stata 15/MP running on a CPU with 16 physical cores. To be slightly more specific, the model specification used was a Rank-Dependent Utility model, with a CRRA utility function and a two-parameter Prelect probability weighting function, and there were three random coefficients in total.

As a quick and dirty check on whether upgrading your licence makes sense, perhaps you can do own comparisons of how well -asmixlogit- and -cmmixlogit- perform with 2 cores relative to 1 core. To get the 1-core results, you can type -set processor 1- in Stata before running your estimation; and to get the 2-core result, you can type -set processor 2- before running your estimation.

Last edited by Hong Il Yoo; 25 Jan 2022, 05:39.
2 likes
Comment
William Lisowski

Join Date: Dec 2014

Posts: 10150
#7

25 Jan 2022, 07:13

I didn't think it worked that way: I thought a Stata MP license gave you access to the full power of Stata MP.

I think of this as Stata offering generous discounts to users whose hardware budget does not extend to the 64 core supercomputers necessary to access the full power of Stata MP, but who would benefit from the speedup offered by using at least 2 or 4 cores, which all modern computers have.

Apparently you did not encounter the product information at https://www.stata.com/products/which...-right-for-me/ or the pricing information at https://www.stata.com/order/ to learn that Stata MP is licensed by the number of cores, up to 64. So should you upgrade to a 10-core license (which can be done, I believe, without needing to reinstall the software) you will likely benefit from not paying for the full power of Stata MP that requires a 64 core processor.

Tiered pricing of software licenses based on factors like the number of named users, the number of concurrent users, the volume of data, the hardware configuration, and other features of the operating environment has been part of software licensing since the earliest days of mainframe computing, when US antitrust law required IBM to unbundle its software from its hardware back in the 1960's.

Last edited by William Lisowski; 25 Jan 2022, 07:15.
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#8

25 Jan 2022, 09:26

Thank you both so much for the useful insights. It seems there might be a distinctive advantage in increasing the number of cores from 2 to 8, but not so much from 2 to 4.

Hong Il Yoo based on your answers to the post I linked above, and given the sample size I am dealing with, do you think it would make sense to use the following options in the regression call (you can just focus on the options as the rest of the command should be standard)?

Code:

asmixlogit location, random(loc_charact_1 loc_charact_2...loc_charact_15) case(investment_id) alternatives(location_id) casevars(firm_charact_1...firm_charact_3) favor(speed) intmethod(halton, mantithetics) intpoints(2500) technique(bhhh 10 nr 1000)

Another related question is: can you think of alternative less computationally-intensive estimators to be used in case the alternative-specific mixed logit does not converge within 120 hours?

I am really grateful for any thoughts.

Last edited by Riccardo Valboni; 25 Jan 2022, 09:31.
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 116
#9

25 Jan 2022, 09:34

Riccardo, from your description it is not clear whether your model is not converging or whether convergence just takes a long time. For the initial modeling stage, I suggest to start with the default integration method but use a low number of integration points. By default, cmmixlogit and cmxtmixlogit use a very conservative (i.e. high) number of integration points which depends on model complexity. To start with, try intpoints(50) or intpoints(100) and see what happens and whether the model converges (I believe the default for mixlogit is fixed at 50). If the model converges, you could start increasing the number of integration points to see if the results remain stable. At that stage, you can use the results from the previous stage as starting values which can save a lot of time, for example:

Code:

cmmixlogit ... , ... intpoints(50) mat binit = e(b) cmmixlogit ... , ... intpoints(500) from(binit)

As for parallelization, unfortunately cmmixlogit and cmxtmixlogit do not benefit much from Stata's MP features, and so obtaining a license for a higher number of processors will probably not help much in this case.

I hope this helps,
Joerg
3 likes
Comment
Joerg Luedicke (StataCorp)

StataCorp Employee

Join Date: Apr 2014

Posts: 116
#10

25 Jan 2022, 09:52

Riccardo, I have seen your most recent post after I wrote mine. There you show a specification with

Code:

intmethod(halton, mantithetics) intpoints(2500)

With multidimensional antithetic draws, notice that the effective number of integration points is 2^d*q where d is the number of random coefficients and q is the number of integration points specified in intpoints(). That is, if you have 15 random coefficients, you will end up having 2^15*2,500=81,920,000 integration points, a number I would consider infeasible in this context.
3 likes
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#11

25 Jan 2022, 10:06

Joerg, many thanks, this is incredibly helpful. I am not sure whether the model is not converging or it is the convergence that takes long. Usually, when the model does not converge Stata shows the final iterations - I remember seeing this in simple logit models. Here, instead, the log file shows the launch of the regression and nothing else. Below that point, it just appears:

Code:

Fitting fixed parameter model: Fitting full model

And nothing else.

I am going to try your suggested solution and post an update in case I succeed or not.
1 like
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#12

25 Jan 2022, 10:09

Originally posted by Joerg Luedicke (StataCorp) View Post

With multidimensional antithetic draws, notice that the effective number of integration points is 2^d*q where d is the number of random coefficients and q is the number of integration points specified in intpoints(). That is, if you have 15 random coefficients, you will end up having 2^15*2,500=81,920,000 integration points, a number I would consider infeasible in this context.

Indeed, I will start with 50 as you suggested and increase in small steps.
1 like
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#13

26 Jan 2022, 03:23

Riccardo Valboni:

I agree with Joerg that you can start off with a smaller number of draws. You can also consider switching off the -mantithetic- option if the memory size is an issue. I often use -technique(bfgs)- or -technique(bfgs 40 nr 10)- in non-linear optimisation, though I haven't worked with the two commands you're interested in yet.

As regards alternative estimators, you may consider Matthew Baker's -bayesmixedlogit- command and apply the Bayesian procedure as a gradient-free computational method, as Train explains in his textbook (Discrete Choice Methods with Simulation, 2nd ed, Ch 12: https://eml.berkeley.edu/books/choic...2_p282-314.pdf). Again, I have no direct experience with this command but given Train's discussion, I expect the procedure to run faster than the MSL procedure considering that your model specification only includes random coefficients.

Alternatively, if you're happy to assume that the mixing distribution is categorical rather than normal, you may consider my -lclogit2- command (https://doi.org/10.1177/1536867X20931003).
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#14

28 Jan 2022, 12:25

Hi Hong Il Yoo, many thanks for this further advice! I think that together with setting integration points to a minimum, changing -technique- could really help achieve convergence. I dropped the -mantithetic- option and, even with just 50 integration points as suggested by Joerg, the model is not converging. It has been running for 48 hours now and I am afraid it's stuck somewhere in a flat region. I will leave it running up to the max of 120 hours just in case, and, if it doesn't make it to produce any estimates, I will restart it by setting -technique- the way you advised.
Comment
Riccardo Valboni

Join Date: Jun 2014

Posts: 123
#15

02 Feb 2022, 01:41

Hi, a small update: I think the advice from Hong Il Yoo of using -technique(bfgs)- is the way to go. I actually specified the latter, i.e., -technique(bfgs 40 nr 10)- and the model did 40 iterations within about 24 hours from starting and then it got stuck; it has been stuck for 48 hours now. I conclude that it's probably better to go with -bfgs- alone and, in case the model does not converge, the best thing to do is to gradually increase the number of integration points. Please let me know if you think any of these ideas sound incorrect.

Last edited by Riccardo Valboni; 02 Feb 2022, 01:56.
Comment

Announcement

Speeding up computation time of -asmixlogit-/-cmmixlogit-

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment