Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with base attribute in clogit

    Hello everyone,

    I have an issue relating to running clogit results.

    I have 5 attributes a, b, c, d, and e. So my clogit command goes like:

    clogit choice a b c d e, group(gid) cluster (ID)
    However, the results showed ALL attribute coefficients while I think it should only show four of these attributes, one of them will be used as base attribute. I would like to see how different attributes contribute to preferences. I coded them all dummy variables but not sure why it happened like that.

    Any advice would be really appreciated.

    Thanks a lot.

  • #2
    Vince:
    what you expect is driven by collinearity, that does not seem to bite in your example, whereas it does in the following one:
    Code:
    . use https://www.stata-press.com/data/r17/lowbirth2, clear
    (Applied Logistic Regression, Hosmer & Lemeshow)
    
    . clogit low lwt smoke ptd ht ui i.race, group(pairid) nolog
    
    Conditional (fixed-effects) logistic regression         Number of obs =    112
                                                            LR chi2(7)    =  26.04
                                                            Prob > chi2   = 0.0005
    Log likelihood = -25.794271                             Pseudo R2     = 0.3355
    
    ------------------------------------------------------------------------------
             low | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             lwt |  -.0183757   .0100806    -1.82   0.068    -.0381333    .0013819
           smoke |   1.400656   .6278396     2.23   0.026     .1701131    2.631199
             ptd |   1.808009   .7886502     2.29   0.022     .2622828    3.353735
              ht |   2.361152   1.086128     2.17   0.030     .2323796    4.489924
              ui |   1.401929   .6961585     2.01   0.044     .0374836    2.766375
                 |
            race |
          Black  |   .5713643    .689645     0.83   0.407    -.7803149    1.923044
          Other  |  -.0253148   .6992044    -0.04   0.971     -1.39573    1.345101
    ------------------------------------------------------------------------------
    
    . label list race
    race:
               1 White
               2 Black
               3 Other
    
    . g race_white=1 if race==1
    (68 missing values generated)
    
    . replace race_white=0 if race_white==.
    (68 real changes made)
    
    . g race_black=1 if race==2
    (91 missing values generated)
    
    . replace race_black =0 if race_black==.
    (91 real changes made)
    
    . g race_other=1 if race==3
    (65 missing values generated)
    
    . replace race_other =0 if race_other==.
    (65 real changes made)
    
    . clogit low lwt smoke ptd ht ui race_white race_black race_other , group(pairid) nolog
    note: race_other omitted because of collinearity.
    
    Conditional (fixed-effects) logistic regression         Number of obs =    112
                                                            LR chi2(7)    =  26.04
                                                            Prob > chi2   = 0.0005
    Log likelihood = -25.794271                             Pseudo R2     = 0.3355
    
    ------------------------------------------------------------------------------
             low | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
             lwt |  -.0183757   .0100806    -1.82   0.068    -.0381333    .0013819
           smoke |   1.400656   .6278396     2.23   0.026     .1701131    2.631199
             ptd |   1.808009   .7886502     2.29   0.022     .2622828    3.353735
              ht |   2.361152   1.086128     2.17   0.030     .2323796    4.489924
              ui |   1.401929   .6961585     2.01   0.044     .0374836    2.766375
      race_white |   .0253148   .6992044     0.04   0.971    -1.345101     1.39573
      race_black |   .5966791    .737698     0.81   0.419    -.8491824    2.042541
      race_other |          0  (omitted)
    ------------------------------------------------------------------------------
    
    .
    Can't you gather together attributes a-d in one catergorical variable and go -i.attribute- in the right-hand side of your regerssion equation?
    Last edited by Carlo Lazzaro; 15 Sep 2023, 01:24.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Originally posted by Carlo Lazzaro View Post
      Vince:
      what you expect is driven by collinearity, that does not seem to bite in your example, whereas it does in the following one:
      Code:
      . use https://www.stata-press.com/data/r17/lowbirth2, clear
      (Applied Logistic Regression, Hosmer & Lemeshow)
      
      . clogit low lwt smoke ptd ht ui i.race, group(pairid) nolog
      
      Conditional (fixed-effects) logistic regression Number of obs = 112
      LR chi2(7) = 26.04
      Prob > chi2 = 0.0005
      Log likelihood = -25.794271 Pseudo R2 = 0.3355
      
      ------------------------------------------------------------------------------
      low | Coefficient Std. err. z P>|z| [95% conf. interval]
      -------------+----------------------------------------------------------------
      lwt | -.0183757 .0100806 -1.82 0.068 -.0381333 .0013819
      smoke | 1.400656 .6278396 2.23 0.026 .1701131 2.631199
      ptd | 1.808009 .7886502 2.29 0.022 .2622828 3.353735
      ht | 2.361152 1.086128 2.17 0.030 .2323796 4.489924
      ui | 1.401929 .6961585 2.01 0.044 .0374836 2.766375
      |
      race |
      Black | .5713643 .689645 0.83 0.407 -.7803149 1.923044
      Other | -.0253148 .6992044 -0.04 0.971 -1.39573 1.345101
      ------------------------------------------------------------------------------
      
      . label list race
      race:
      1 White
      2 Black
      3 Other
      
      . g race_white=1 if race==1
      (68 missing values generated)
      
      . replace race_white=0 if race_white==.
      (68 real changes made)
      
      . g race_black=1 if race==2
      (91 missing values generated)
      
      . replace race_black =0 if race_black==.
      (91 real changes made)
      
      . g race_other=1 if race==3
      (65 missing values generated)
      
      . replace race_other =0 if race_other==.
      (65 real changes made)
      
      . clogit low lwt smoke ptd ht ui race_white race_black race_other , group(pairid) nolog
      note: race_other omitted because of collinearity.
      
      Conditional (fixed-effects) logistic regression Number of obs = 112
      LR chi2(7) = 26.04
      Prob > chi2 = 0.0005
      Log likelihood = -25.794271 Pseudo R2 = 0.3355
      
      ------------------------------------------------------------------------------
      low | Coefficient Std. err. z P>|z| [95% conf. interval]
      -------------+----------------------------------------------------------------
      lwt | -.0183757 .0100806 -1.82 0.068 -.0381333 .0013819
      smoke | 1.400656 .6278396 2.23 0.026 .1701131 2.631199
      ptd | 1.808009 .7886502 2.29 0.022 .2622828 3.353735
      ht | 2.361152 1.086128 2.17 0.030 .2323796 4.489924
      ui | 1.401929 .6961585 2.01 0.044 .0374836 2.766375
      race_white | .0253148 .6992044 0.04 0.971 -1.345101 1.39573
      race_black | .5966791 .737698 0.81 0.419 -.8491824 2.042541
      race_other | 0 (omitted)
      ------------------------------------------------------------------------------
      
      .
      Can't you gather together attributes a-d in one catergorical variable and go -i.attribute- in the right-hand side of your regerssion equation?
      Thanks Carlo. I am confused on how to gather all attributes under one categorical variable. Let's say attribute a has value 1 and 2, b values 1 and 2 and 3, and so on. So may I ask how do I compile under one categorical variable?

      Comment


      • #4
        Vince:
        if the predictors show no collinearity problem, you can plug them all in the right-hand side of your regression equation.
        That said, if your 1,2,3,..n in your variables mean diferent level of a given attribute, you may want to try:
        Code:
        clogit choice i.a i.b i.c i.d i.e, group(gid) cluster (ID)
        and see what happens.
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Originally posted by Carlo Lazzaro View Post
          Vince:
          if the predictors show no collinearity problem, you can plug them all in the right-hand side of your regression equation.
          That said, if your 1,2,3,..n in your variables mean diferent level of a given attribute, you may want to try:
          Code:
          clogit choice i.a i.b i.c i.d i.e, group(gid) cluster (ID)
          and see what happens.
          Thanks Carlo. The results I got is that one level of each attribute has been missing which is the reference for that attribute. What I am looking for is actually just attribute reported and one attribute will be omitted, then I can see how respondents looked at across-attribute importance. Or that is impossible? I remember that I read some papers they do include attribute only, but I might be wrong.

          Much appreciated Carlo!

          Comment


          • #6
            Vince:
            the results you obtained are as expected.
            Unfortunately, they do not seem to fullfil your research goals.
            I do not think that what you're after can be obtained (but I might be wrong).
            That said, you may want to consider different models with different specification in the right hand-side of your regression equation.
            Kind regards,
            Carlo
            (Stata 19.0)

            Comment


            • #7
              Vince Vo: I don't quite follow what it is that you have in mind when you write "What I am looking for is actually just attribute reported and one attribute will be omitted, then I can see how respondents looked at across-attribute importance." To facilitate our conversation, let's suppose that we're looking at a choice between two delivery pizzas, pizza A and pizza B. Suppose further that each pizza is described by two attributes, say price (which has three levels $8, $10, or $12) and delivery time (which also has three levels, say 20 minutes, 40 minutes, or 60 minutes). In this context, could you explain what parameters you're interested in estimating?

              Comment


              • #8
                Originally posted by Hong Il Yoo View Post
                Vince Vo: I don't quite follow what it is that you have in mind when you write "What I am looking for is actually just attribute reported and one attribute will be omitted, then I can see how respondents looked at across-attribute importance." To facilitate our conversation, let's suppose that we're looking at a choice between two delivery pizzas, pizza A and pizza B. Suppose further that each pizza is described by two attributes, say price (which has three levels $8, $10, or $12) and delivery time (which also has three levels, say 20 minutes, 40 minutes, or 60 minutes). In this context, could you explain what parameters you're interested in estimating?
                Thanks Hong for making it easier to explain. I'd like to see between price and delivery time, which attribute respondents would value more (ie which one has higher preference rate than the other one).

                Comment


                • #9
                  Originally posted by Carlo Lazzaro View Post
                  Vince:
                  the results you obtained are as expected.
                  Unfortunately, they do not seem to fullfil your research goals.
                  I do not think that what you're after can be obtained (but I might be wrong).
                  That said, you may want to consider different models with different specification in the right hand-side of your regression equation.
                  I think you are right. Years ago when I worked on a choice experiment dataset, I have the same issue and ended up with the solution you suggested above.

                  Comment


                  • #10
                    Originally posted by Vince Vo View Post

                    Thanks Hong for making it easier to explain. I'd like to see between price and delivery time, which attribute respondents would value more (ie which one has higher preference rate than the other one).
                    That's a tricky issue because without further assumptions the only thing that we can identify from choice models is the effects of level changes within an attribute. You may want to consult this helpful review article:

                    Gonzalez, J.M. A Guide to Measuring and Interpreting Attribute Importance. Patient 12, 287–295 (2019). https://doi.org/10.1007/s40271-019-00360-3

                    and see if any of the importance measures suits your requirements. As you'll see, there's no single way to define and measure the notion of importance.

                    Comment

                    Working...
                    X