Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Must the choices in the asclogit model be mutually exclusive?

    Statalist,

    I understand that the choices within discrete choice models should be mutually exclusive and exhaustive. Is this true for the choices in an asclogit model? I know this isn't a good way to go about confirming this, but I've tried to run the asclogit command with the auto dataset as described in http://www.stata.com/manuals13/rasclogit.pdf and changing the choices such that they are not mutually exclusive, yet the model still runs.

    In contrast when I run the asclogit on my real dataset, I get the following error: "variable X has replicate levels for one or more cases; this is not allowed"

    Any thoughts?

  • #2
    Please provide actual commands/syntax and the errors you get. Paste them into code blocks (hit the "A" above where you type, then "#"). The way things stand, it's impossible to tell why you get the results (lack thereof) that you do. Somebody with extensive experience with asclogit might have a simple answer, but in general, seeing the actual commands and results is useful or essential to diagnose problems. I was able to verify that it runs fine with multiple choices selected (I made 10% of them choose Japanese cars, even if they *also* chose American or European), and it runs. But dunno what's different about your data or commands you run on your own data.
    Last edited by ben earnhart; 02 Dec 2014, 20:44.

    Comment


    • #3
      Ah. I played around a bit. Is your data balanced (if there are x choices, you need x obs per case)? If I mangle the example data by giving one obs a fourth case and another 2 cases, it gets the error you got. So the error is a bit misleading; they *can* choose more than one of the alternatives, but for the model, they all need the same alternatives (# of alternatives). Hope this make sense?

      Comment


      • #4
        I find it very surprising that asclogit allows more than one alternative to be chosen since the random utility theory that the McFadden model is derived from assumes that a single alternative is chosen.

        If you have a dataset with more than one chosen alternative per individual I would expand the data so that each choice appears as a separate observation (or a separate group of observations, to be more precise). An example is given below – note that this gives different results from running asclogit directly on the original data.

        Code:
        webuse choice
        
        * recode data so that 10% of sample always choose a Japanese
        * car even if they also choose US or European
        set seed 12345
        bysort id (car): egen rnd = total(runiform()*(_n==_N))
        replace choice = 1 if car==2 & rnd <.1
        
        asclogit choice dealer, case(id) alternatives(car) casevars(sex income)
        
        * expand data so that there is one group per alternative
        expandcl 3, generate(newid) cluster(id)
        
        * recode choice variable so that only one alternative
        * is chosen per group and drop groups with no chosen alternatives
        bysort id newid (car): gen dupno = _n==1
        bysort id (newid car): replace dupno = sum(dupno)
        replace choice = 0 if dupno==1 &  (car==2 | car==3)
        replace choice = 0 if dupno==2 &  (car==1 | car==3)
        replace choice = 0 if dupno==3 &  (car==1 | car==2)
        bysort id newid (car): egen nchoice = total(choice)
        drop if nchoice==0
        
        asclogit choice dealer, case(newid) alternatives(car) casevars(sex income)
        The “variable X has replicate levels for one or more cases; this is not allowed” error message appears when an alternative (e.g. Japan) is repeated within a choice set. This suggests that something is wrong with the data setup.

        Arne
        Last edited by Arne Risa Hole; 03 Dec 2014, 06:32.

        Comment


        • #5
          Is this not basically a fixed-effects (conditional) logit model? Why should the number of 1s be restricted to one in such a model?

          Best
          Daniel

          Comment


          • #6
            Daniel

            No, these are two different models. See the second post in http://www.statalist.org/forums/foru...al-logit-model. You are right that there is no restriction on the number of 1s in a fixed-effects logit model but there is (or should be) in the McFadden model.

            Arne

            Comment


            • #7
              That is an interesting claim, since the estimation is basically the same - with just some interaction-terms added in McFadden's model. I admit I would need to so some re-reading to judge whether mutual exclusive choices are indeed a requirement/assumption underlying this model. I will do so ,as soon as I find the time. However, for now this bears the question, why the help file for asclogit explicitly states, that

              There can be multiple alternatives chosen for each case.
              If you are correct then the manual is clearly misleading here.

              Best
              Daniel

              Comment


              • #8
                Daniel

                Yes, computationally they are essentially the same but they are substantively different models. All standard textbooks describing the McFadden model (e.g. http://eml.berkeley.edu/books/choice2.html) start from the assumption that the alternatives are mutually exclusive, it is not my claim.

                I agree with you that the manual seems to be misleading here.

                Arne

                Comment


                • #9
                  Still, I am not into this deep enough, but I see Arne's point. The underlying "theory" of random-utility (or more broadly economic rational-choice theory) states that the chosen alternative has the maximum utility, which more or less explicitly stats that there is one alternative that has the maximum utility. I am not sure yet how relevant this theoretical reasoning really is for the statistical model.

                  Say, we buy the theory. There is one alternative that has the maximum utility, and is therefore chosen. But we have data, where individuals have chosen more than one alternative. I do not believe that restructuring the dataset, as suggested here, will solve this theoretical problem. Some individuals have chosen more than one alternative and regrouping the data will not change that fact. It does however potentially introduce new problems. Those individuals that have chosen more than one alternative now appear more than one time in the dataset, which will almost certainly bias the standard errors [and maybe even the point estimates - but I have no clear idea here].

                  Best
                  Daniel
                  Last edited by daniel klein; 03 Dec 2014, 07:42. Reason: whether point estimates are affected is not clear to me

                  Comment


                  • #10
                    Daniel

                    I agree that expanding the data is not a perfect fix, but it offers a practical solution to the problem of some respondents choosing more than one alternative. I would personally prefer that solution to running asclogit on the original data, when I don't know exactly how asclogit deals with multiple chosen alternatives. But it's just a suggestion and I'm happy for us to disagree. (If you are worried about the SEs you can cluster at the respondent level)

                    Arne

                    Comment


                    • #11
                      Do you have anything that plays the role of dealer in your model? In the example data, they basically make a choice of dealer *and* type of car. If they don't have a secondary choice like that, and the choices are mutually exclusive, then collapsing the data and running it as a multinomial logit seems like an attractive approach. -mlogit- models are better understood and give you all the bells and whistles regarding output and post-estimation commands that standard regression models do, whereas asclogit is an oddity. But, if you have the equivalent of a dealer intervening in the choice, then I guess you're stuck with ascligit.

                      Comment


                      • #12
                        Originally posted by Arne Risa Hole View Post
                        If you are worried about the SEs you can cluster at the respondent level
                        Arne
                        That would be the original id variable, and this is what I would add to your suggestion.

                        However, the link you provide offers another alternative, which I would prefer. Instead of adding observations, we could add choices. In the example above, we would add the choices "US and Japan" and "European and Japan" for all observations in the dataset. This way the choices are mutually exclusive and we have no "fake" observations in the dataset.

                        I have not figured out the code to do so, but if Sanford (or anyone else) is still interested, I can do so.

                        Best
                        Daniel

                        Comment


                        • #13
                          Those individuals that have chosen more than one alternative now appear more than one time in the dataset, which will almost certainly bias the standard errors [and maybe even the point estimates - but I have no clear idea here].
                          I collapsed the example data to one observation per case, and ran mlogit. See below. Identical models if you don't have the secondary step, shoice of dealer.

                          asclogit:
                          Code:
                          Alternative-specific conditional logit         Number of obs      =        885
                          Case variable: id                              Number of cases    =        295
                          
                          Alternative variable: car                      Alts per case: min =          3
                                                                                        avg =        3.0
                                                                                        max =          3
                          
                                                                            Wald chi2(4)    =      12.53
                          Log likelihood = -252.72012                       Prob > chi2     =     0.0138
                          
                          ------------------------------------------------------------------------------
                                choice |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                          American     |  (base alternative)
                          -------------+----------------------------------------------------------------
                          Japan        |
                                   sex |  -.4694799   .3114939    -1.51   0.132    -1.079997     .141037
                                income |   .0276854   .0123666     2.24   0.025     .0034472    .0519236
                                 _cons |  -1.962652   .6216804    -3.16   0.002    -3.181123   -.7441807
                          -------------+----------------------------------------------------------------
                          Europe       |
                                   sex |   .5388441   .4525279     1.19   0.234    -.3480942    1.425782
                                income |   .0273669    .013787     1.98   0.047      .000345    .0543889
                                 _cons |  -3.180029   .7546837    -4.21   0.000    -4.659182   -1.700876
                          ------------------------------------------------------------------------------
                          mlogit:
                          Code:
                          Multinomial logistic regression                   Number of obs   =        295
                                                                            LR chi2(4)      =      12.90
                                                                            Prob > chi2     =     0.0118
                          Log likelihood = -252.72012                       Pseudo R2       =     0.0249
                          
                          ------------------------------------------------------------------------------
                                   car |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                          -------------+----------------------------------------------------------------
                          American     |  (base outcome)
                          -------------+----------------------------------------------------------------
                          Japan        |
                                   sex |  -.4694798   .3114939    -1.51   0.132    -1.079997    .1410371
                                income |   .0276854   .0123666     2.24   0.025     .0034472    .0519236
                                 _cons |  -1.962651   .6216803    -3.16   0.002    -3.181122   -.7441801
                          -------------+----------------------------------------------------------------
                          Europe       |
                                   sex |   .5388443   .4525278     1.19   0.234     -.348094    1.425783
                                income |    .027367    .013787     1.98   0.047      .000345    .0543889
                                 _cons |   -3.18003   .7546837    -4.21   0.000    -4.659182   -1.700877
                          ------------------------------------------------------------------------------
                          Last edited by ben earnhart; 03 Dec 2014, 08:59.

                          Comment


                          • #14
                            Here is a (integrated) code that adds choices instead of observations. Results are shown in m2 along with the original results m1 and those from the model with added observations m3.

                            Code:
                            webuse choice ,clear
                            
                            * recode data so that 10% of sample always choose a Japanese
                            * car even if they also choose US or European
                            set seed 12345
                            bysort id (car): egen rnd = total(runiform()*(_n==_N))
                            replace choice = 1 if car==2 & rnd <.1
                            
                            asclogit choice dealer, case(id) alternatives(car) casevars(sex income)
                            est sto m1
                            
                            // adding alternatives
                            preserve
                            keep id car choice
                            reshape wide choice ,i(id) j(car)
                            g byte choice4 = (choice1 == 1) & (choice2 == 1)
                            g byte choice5 = (choice3 == 1) & (choice2 == 1)
                            replace choice2 = 0 if (choice4 == 1)
                            replace choice3 = 0 if (choice5 == 1)
                            reshape long choice ,i(id) j(car)
                            tempfile tmp
                            sa `tmp'
                            restore
                            
                            preserve
                            
                            drop car choice
                            mer m:m id using `tmp' ,nogen 
                                // yes m:m merges are in general a bad idea
                            la de nation 4 "US and Japan" 5 "European and Japan" ,modify
                            
                            asclogit choice dealer, case(id) alternatives(car) casevars(sex income)
                            est sto m2
                            
                            restore
                            
                            * expand data so that there is one group per alternative
                            expandcl 3, generate(newid) cluster(id)
                            
                            * recode choice variable so that only one alternative
                            * is chosen per group and drop groups with no chosen alternatives
                            bysort id newid (car): gen dupno = _n==1
                            bysort id (newid car): replace dupno = sum(dupno)
                            replace choice = 0 if dupno==1 &  (car==2 | car==3)
                            replace choice = 0 if dupno==2 &  (car==1 | car==3)
                            replace choice = 0 if dupno==3 &  (car==1 | car==2)
                            bysort id newid (car): egen nchoice = total(choice)
                            drop if nchoice==0
                            
                            asclogit choice dealer, case(newid) alternatives(car) casevars(sex income)
                            
                            est sto m3
                            
                            est tab m1 m2 m3
                            Best
                            Daniel

                            Comment


                            • #15
                              Daniel

                              Thanks. I agree that your suggestion is a possible alternative solution that is more consistent with the underlying random utility model. Just one comment: I don't think your code as it stands deals with the issue of which values to assign to the alternative-specific dealer variable for the new "US and Japan" and "Europe and Japan" alternatives. One solution in this example might be to assign the value of dealer for US plus the value for Japan to "US and Japan" and the value for Europe plus the value for Japan to "Europe and Japan". More generally though your solution raises the new challenge of assigning appropriate values for alternative-specific variables to the new alternatives, which might be tricky. Another issue in this specific example is that since "US and Japan" and "Europe and Japan" are chosen by few individuals (in our made-up data) the coefficients for the individual-specific variables are very imprecisely estimated [Edit: individuals are also randomly assigned to these alternatives, of course, which is more important here. But creating new alternatives could mean that some alternatives are chosen by very few individuals, if only a small number of individuals chose more than one alternative].

                              Arne
                              Last edited by Arne Risa Hole; 03 Dec 2014, 10:30.

                              Comment

                              Working...
                              X