Must the choices in the asclogit model be mutually exclusive?

Sanford Pitcher

Join Date: Nov 2014

Posts: 6
#1

Must the choices in the asclogit model be mutually exclusive?

02 Dec 2014, 20:20

Statalist,

I understand that the choices within discrete choice models should be mutually exclusive and exhaustive. Is this true for the choices in an asclogit model? I know this isn't a good way to go about confirming this, but I've tried to run the asclogit command with the auto dataset as described in http://www.stata.com/manuals13/rasclogit.pdf and changing the choices such that they are not mutually exclusive, yet the model still runs.

In contrast when I run the asclogit on my real dataset, I get the following error: "variable X has replicate levels for one or more cases; this is not allowed"

Any thoughts?
Tags: None
ben earnhart

Join Date: May 2014

Posts: 1027
#2

02 Dec 2014, 20:29

Please provide actual commands/syntax and the errors you get. Paste them into code blocks (hit the "A" above where you type, then "#"). The way things stand, it's impossible to tell why you get the results (lack thereof) that you do. Somebody with extensive experience with asclogit might have a simple answer, but in general, seeing the actual commands and results is useful or essential to diagnose problems. I was able to verify that it runs fine with multiple choices selected (I made 10% of them choose Japanese cars, even if they *also* chose American or European), and it runs. But dunno what's different about your data or commands you run on your own data.

Last edited by ben earnhart; 02 Dec 2014, 20:44.
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#3

02 Dec 2014, 20:58

Ah. I played around a bit. Is your data balanced (if there are x choices, you need x obs per case)? If I mangle the example data by giving one obs a fourth case and another 2 cases, it gets the error you got. So the error is a bit misleading; they *can* choose more than one of the alternatives, but for the model, they all need the same alternatives (# of alternatives). Hope this make sense?
Comment

Arne Risa Hole

Join Date: Apr 2014
Posts: 130

03 Dec 2014, 06:29

I find it very surprising that asclogit allows more than one alternative to be chosen since the random utility theory that the McFadden model is derived from assumes that a single alternative is chosen.

If you have a dataset with more than one chosen alternative per individual I would expand the data so that each choice appears as a separate observation (or a separate group of observations, to be more precise). An example is given below – note that this gives different results from running asclogit directly on the original data.

Code:

webuse choice

* recode data so that 10% of sample always choose a Japanese
* car even if they also choose US or European
set seed 12345
bysort id (car): egen rnd = total(runiform()*(_n==_N))
replace choice = 1 if car==2 & rnd <.1

asclogit choice dealer, case(id) alternatives(car) casevars(sex income)

* expand data so that there is one group per alternative
expandcl 3, generate(newid) cluster(id)

* recode choice variable so that only one alternative
* is chosen per group and drop groups with no chosen alternatives
bysort id newid (car): gen dupno = _n==1
bysort id (newid car): replace dupno = sum(dupno)
replace choice = 0 if dupno==1 &  (car==2 | car==3)
replace choice = 0 if dupno==2 &  (car==1 | car==3)
replace choice = 0 if dupno==3 &  (car==1 | car==2)
bysort id newid (car): egen nchoice = total(choice)
drop if nchoice==0

asclogit choice dealer, case(newid) alternatives(car) casevars(sex income)

The “variable X has replicate levels for one or more cases; this is not allowed” error message appears when an alternative (e.g. Japan) is repeated within a choice set. This suggests that something is wrong with the data setup.

Arne

Last edited by Arne Risa Hole; 03 Dec 2014, 06:32.

Comment

daniel klein

Join Date: Mar 2014

Posts: 3862
#5

03 Dec 2014, 06:34

Is this not basically a fixed-effects (conditional) logit model? Why should the number of 1s be restricted to one in such a model?

Best
Daniel
Comment
Arne Risa Hole

Join Date: Apr 2014

Posts: 130
#6

03 Dec 2014, 06:41

Daniel

No, these are two different models. See the second post in http://www.statalist.org/forums/foru...al-logit-model. You are right that there is no restriction on the number of 1s in a fixed-effects logit model but there is (or should be) in the McFadden model.

Arne
Comment
daniel klein

Join Date: Mar 2014

Posts: 3862
#7

03 Dec 2014, 06:53

That is an interesting claim, since the estimation is basically the same - with just some interaction-terms added in McFadden's model. I admit I would need to so some re-reading to judge whether mutual exclusive choices are indeed a requirement/assumption underlying this model. I will do so ,as soon as I find the time. However, for now this bears the question, why the help file for asclogit explicitly states, that

There can be multiple alternatives chosen for each case.

If you are correct then the manual is clearly misleading here.

Best
Daniel
Comment
Arne Risa Hole

Join Date: Apr 2014

Posts: 130
#8

03 Dec 2014, 07:23

Daniel

Yes, computationally they are essentially the same but they are substantively different models. All standard textbooks describing the McFadden model (e.g. http://eml.berkeley.edu/books/choice2.html) start from the assumption that the alternatives are mutually exclusive, it is not my claim.

I agree with you that the manual seems to be misleading here.

Arne
Comment
daniel klein

Join Date: Mar 2014

Posts: 3862
#9

03 Dec 2014, 07:35

Still, I am not into this deep enough, but I see Arne's point. The underlying "theory" of random-utility (or more broadly economic rational-choice theory) states that the chosen alternative has the maximum utility, which more or less explicitly stats that there is one alternative that has the maximum utility. I am not sure yet how relevant this theoretical reasoning really is for the statistical model.

Say, we buy the theory. There is one alternative that has the maximum utility, and is therefore chosen. But we have data, where individuals have chosen more than one alternative. I do not believe that restructuring the dataset, as suggested here, will solve this theoretical problem. Some individuals have chosen more than one alternative and regrouping the data will not change that fact. It does however potentially introduce new problems. Those individuals that have chosen more than one alternative now appear more than one time in the dataset, which will almost certainly bias the standard errors [and maybe even the point estimates - but I have no clear idea here].

Best
Daniel

Last edited by daniel klein; 03 Dec 2014, 07:42. Reason: whether point estimates are affected is not clear to me
Comment
Arne Risa Hole

Join Date: Apr 2014

Posts: 130
#10

03 Dec 2014, 07:53

Daniel

I agree that expanding the data is not a perfect fix, but it offers a practical solution to the problem of some respondents choosing more than one alternative. I would personally prefer that solution to running asclogit on the original data, when I don't know exactly how asclogit deals with multiple chosen alternatives. But it's just a suggestion and I'm happy for us to disagree. (If you are worried about the SEs you can cluster at the respondent level)

Arne
Comment
ben earnhart

Join Date: May 2014

Posts: 1027
#11

03 Dec 2014, 08:28

Do you have anything that plays the role of dealer in your model? In the example data, they basically make a choice of dealer *and* type of car. If they don't have a secondary choice like that, and the choices are mutually exclusive, then collapsing the data and running it as a multinomial logit seems like an attractive approach. -mlogit- models are better understood and give you all the bells and whistles regarding output and post-estimation commands that standard regression models do, whereas asclogit is an oddity. But, if you have the equivalent of a dealer intervening in the choice, then I guess you're stuck with ascligit.
Comment
daniel klein

Join Date: Mar 2014

Posts: 3862
#12

03 Dec 2014, 08:30

Originally posted by Arne Risa Hole View Post

If you are worried about the SEs you can cluster at the respondent level
Arne

That would be the original id variable, and this is what I would add to your suggestion.

However, the link you provide offers another alternative, which I would prefer. Instead of adding observations, we could add choices. In the example above, we would add the choices "US and Japan" and "European and Japan" for all observations in the dataset. This way the choices are mutually exclusive and we have no "fake" observations in the dataset.

I have not figured out the code to do so, but if Sanford (or anyone else) is still interested, I can do so.

Best
Daniel
Comment

ben earnhart

Join Date: May 2014
Posts: 1027

#13

03 Dec 2014, 08:52

Those individuals that have chosen more than one alternative now appear more than one time in the dataset, which will almost certainly bias the standard errors [and maybe even the point estimates - but I have no clear idea here].

I collapsed the example data to one observation per case, and ran mlogit. See below. Identical models if you don't have the secondary step, shoice of dealer.

asclogit:

Code:

Alternative-specific conditional logit         Number of obs      =        885
Case variable: id                              Number of cases    =        295

Alternative variable: car                      Alts per case: min =          3
                                                              avg =        3.0
                                                              max =          3

                                                  Wald chi2(4)    =      12.53
Log likelihood = -252.72012                       Prob > chi2     =     0.0138

------------------------------------------------------------------------------
      choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
American     |  (base alternative)
-------------+----------------------------------------------------------------
Japan        |
         sex |  -.4694799   .3114939    -1.51   0.132    -1.079997     .141037
      income |   .0276854   .0123666     2.24   0.025     .0034472    .0519236
       _cons |  -1.962652   .6216804    -3.16   0.002    -3.181123   -.7441807
-------------+----------------------------------------------------------------
Europe       |
         sex |   .5388441   .4525279     1.19   0.234    -.3480942    1.425782
      income |   .0273669    .013787     1.98   0.047      .000345    .0543889
       _cons |  -3.180029   .7546837    -4.21   0.000    -4.659182   -1.700876
------------------------------------------------------------------------------

mlogit:

Code:

Multinomial logistic regression                   Number of obs   =        295
                                                  LR chi2(4)      =      12.90
                                                  Prob > chi2     =     0.0118
Log likelihood = -252.72012                       Pseudo R2       =     0.0249

------------------------------------------------------------------------------
         car |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
American     |  (base outcome)
-------------+----------------------------------------------------------------
Japan        |
         sex |  -.4694798   .3114939    -1.51   0.132    -1.079997    .1410371
      income |   .0276854   .0123666     2.24   0.025     .0034472    .0519236
       _cons |  -1.962651   .6216803    -3.16   0.002    -3.181122   -.7441801
-------------+----------------------------------------------------------------
Europe       |
         sex |   .5388443   .4525278     1.19   0.234     -.348094    1.425783
      income |    .027367    .013787     1.98   0.047      .000345    .0543889
       _cons |   -3.18003   .7546837    -4.21   0.000    -4.659182   -1.700877
------------------------------------------------------------------------------

Last edited by ben earnhart; 03 Dec 2014, 08:59.

Comment

daniel klein

Join Date: Mar 2014
Posts: 3862

#14

03 Dec 2014, 09:00

Here is a (integrated) code that adds choices instead of observations. Results are shown in m2 along with the original results m1 and those from the model with added observations m3.

Code:

webuse choice ,clear

* recode data so that 10% of sample always choose a Japanese
* car even if they also choose US or European
set seed 12345
bysort id (car): egen rnd = total(runiform()*(_n==_N))
replace choice = 1 if car==2 & rnd <.1

asclogit choice dealer, case(id) alternatives(car) casevars(sex income)
est sto m1

// adding alternatives
preserve
keep id car choice
reshape wide choice ,i(id) j(car)
g byte choice4 = (choice1 == 1) & (choice2 == 1)
g byte choice5 = (choice3 == 1) & (choice2 == 1)
replace choice2 = 0 if (choice4 == 1)
replace choice3 = 0 if (choice5 == 1)
reshape long choice ,i(id) j(car)
tempfile tmp
sa `tmp'
restore

preserve

drop car choice
mer m:m id using `tmp' ,nogen 
    // yes m:m merges are in general a bad idea
la de nation 4 "US and Japan" 5 "European and Japan" ,modify

asclogit choice dealer, case(id) alternatives(car) casevars(sex income)
est sto m2

restore

* expand data so that there is one group per alternative
expandcl 3, generate(newid) cluster(id)

* recode choice variable so that only one alternative
* is chosen per group and drop groups with no chosen alternatives
bysort id newid (car): gen dupno = _n==1
bysort id (newid car): replace dupno = sum(dupno)
replace choice = 0 if dupno==1 &  (car==2 | car==3)
replace choice = 0 if dupno==2 &  (car==1 | car==3)
replace choice = 0 if dupno==3 &  (car==1 | car==2)
bysort id newid (car): egen nchoice = total(choice)
drop if nchoice==0

asclogit choice dealer, case(newid) alternatives(car) casevars(sex income)

est sto m3

est tab m1 m2 m3

Best
Daniel

Comment

Arne Risa Hole

Join Date: Apr 2014

Posts: 130
#15

03 Dec 2014, 10:26

Daniel

Thanks. I agree that your suggestion is a possible alternative solution that is more consistent with the underlying random utility model. Just one comment: I don't think your code as it stands deals with the issue of which values to assign to the alternative-specific dealer variable for the new "US and Japan" and "Europe and Japan" alternatives. One solution in this example might be to assign the value of dealer for US plus the value for Japan to "US and Japan" and the value for Europe plus the value for Japan to "Europe and Japan". More generally though your solution raises the new challenge of assigning appropriate values for alternative-specific variables to the new alternatives, which might be tricky. Another issue in this specific example is that since "US and Japan" and "Europe and Japan" are chosen by few individuals (in our made-up data) the coefficients for the individual-specific variables are very imprecisely estimated [Edit: individuals are also randomly assigned to these alternatives, of course, which is more important here. But creating new alternatives could mean that some alternatives are chosen by very few individuals, if only a small number of individuals chose more than one alternative].

Arne

Last edited by Arne Risa Hole; 03 Dec 2014, 10:30.
Comment

Announcement