Problem with base attribute in clogit

Vince Vo

Join Date: Jan 2020

Posts: 19
#1

Problem with base attribute in clogit

14 Sep 2023, 23:11

Hello everyone,

I have an issue relating to running clogit results.

I have 5 attributes a, b, c, d, and e. So my clogit command goes like:

clogit choice a b c d e, group(gid) cluster (ID)

However, the results showed ALL attribute coefficients while I think it should only show four of these attributes, one of them will be used as base attribute. I would like to see how different attributes contribute to preferences. I coded them all dummy variables but not sure why it happened like that.

Any advice would be really appreciated.

Thanks a lot.
Tags: None

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17730

15 Sep 2023, 01:21

Vince:
what you expect is driven by collinearity, that does not seem to bite in your example, whereas it does in the following one:

Code:

. use https://www.stata-press.com/data/r17/lowbirth2, clear
(Applied Logistic Regression, Hosmer & Lemeshow)

. clogit low lwt smoke ptd ht ui i.race, group(pairid) nolog

Conditional (fixed-effects) logistic regression         Number of obs =    112
                                                        LR chi2(7)    =  26.04
                                                        Prob > chi2   = 0.0005
Log likelihood = -25.794271                             Pseudo R2     = 0.3355

------------------------------------------------------------------------------
         low | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         lwt |  -.0183757   .0100806    -1.82   0.068    -.0381333    .0013819
       smoke |   1.400656   .6278396     2.23   0.026     .1701131    2.631199
         ptd |   1.808009   .7886502     2.29   0.022     .2622828    3.353735
          ht |   2.361152   1.086128     2.17   0.030     .2323796    4.489924
          ui |   1.401929   .6961585     2.01   0.044     .0374836    2.766375
             |
        race |
      Black  |   .5713643    .689645     0.83   0.407    -.7803149    1.923044
      Other  |  -.0253148   .6992044    -0.04   0.971     -1.39573    1.345101
------------------------------------------------------------------------------

. label list race
race:
           1 White
           2 Black
           3 Other

. g race_white=1 if race==1
(68 missing values generated)

. replace race_white=0 if race_white==.
(68 real changes made)

. g race_black=1 if race==2
(91 missing values generated)

. replace race_black =0 if race_black==.
(91 real changes made)

. g race_other=1 if race==3
(65 missing values generated)

. replace race_other =0 if race_other==.
(65 real changes made)

. clogit low lwt smoke ptd ht ui race_white race_black race_other , group(pairid) nolog
note: race_other omitted because of collinearity.

Conditional (fixed-effects) logistic regression         Number of obs =    112
                                                        LR chi2(7)    =  26.04
                                                        Prob > chi2   = 0.0005
Log likelihood = -25.794271                             Pseudo R2     = 0.3355

------------------------------------------------------------------------------
         low | Coefficient  Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
         lwt |  -.0183757   .0100806    -1.82   0.068    -.0381333    .0013819
       smoke |   1.400656   .6278396     2.23   0.026     .1701131    2.631199
         ptd |   1.808009   .7886502     2.29   0.022     .2622828    3.353735
          ht |   2.361152   1.086128     2.17   0.030     .2323796    4.489924
          ui |   1.401929   .6961585     2.01   0.044     .0374836    2.766375
  race_white |   .0253148   .6992044     0.04   0.971    -1.345101     1.39573
  race_black |   .5966791    .737698     0.81   0.419    -.8491824    2.042541
  race_other |          0  (omitted)
------------------------------------------------------------------------------

.

Can't you gather together attributes a-d in one catergorical variable and go -i.attribute- in the right-hand side of your regerssion equation?

Last edited by Carlo Lazzaro; 15 Sep 2023, 01:24.

Kind regards,
Carlo
(Stata 19.0)

Comment

Vince Vo

Join Date: Jan 2020
Posts: 19

15 Sep 2023, 02:35

Originally posted by Carlo Lazzaro View Post

Vince:
what you expect is driven by collinearity, that does not seem to bite in your example, whereas it does in the following one:

Code:

. use https://www.stata-press.com/data/r17/lowbirth2, clear
(Applied Logistic Regression, Hosmer & Lemeshow)

. clogit low lwt smoke ptd ht ui i.race, group(pairid) nolog

Conditional (fixed-effects) logistic regression Number of obs = 112
LR chi2(7) = 26.04
Prob > chi2 = 0.0005
Log likelihood = -25.794271 Pseudo R2 = 0.3355

------------------------------------------------------------------------------
low | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lwt | -.0183757 .0100806 -1.82 0.068 -.0381333 .0013819
smoke | 1.400656 .6278396 2.23 0.026 .1701131 2.631199
ptd | 1.808009 .7886502 2.29 0.022 .2622828 3.353735
ht | 2.361152 1.086128 2.17 0.030 .2323796 4.489924
ui | 1.401929 .6961585 2.01 0.044 .0374836 2.766375
|
race |
Black | .5713643 .689645 0.83 0.407 -.7803149 1.923044
Other | -.0253148 .6992044 -0.04 0.971 -1.39573 1.345101
------------------------------------------------------------------------------

. label list race
race:
1 White
2 Black
3 Other

. g race_white=1 if race==1
(68 missing values generated)

. replace race_white=0 if race_white==.
(68 real changes made)

. g race_black=1 if race==2
(91 missing values generated)

. replace race_black =0 if race_black==.
(91 real changes made)

. g race_other=1 if race==3
(65 missing values generated)

. replace race_other =0 if race_other==.
(65 real changes made)

. clogit low lwt smoke ptd ht ui race_white race_black race_other , group(pairid) nolog
note: race_other omitted because of collinearity.

Conditional (fixed-effects) logistic regression Number of obs = 112
LR chi2(7) = 26.04
Prob > chi2 = 0.0005
Log likelihood = -25.794271 Pseudo R2 = 0.3355

------------------------------------------------------------------------------
low | Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
lwt | -.0183757 .0100806 -1.82 0.068 -.0381333 .0013819
smoke | 1.400656 .6278396 2.23 0.026 .1701131 2.631199
ptd | 1.808009 .7886502 2.29 0.022 .2622828 3.353735
ht | 2.361152 1.086128 2.17 0.030 .2323796 4.489924
ui | 1.401929 .6961585 2.01 0.044 .0374836 2.766375
race_white | .0253148 .6992044 0.04 0.971 -1.345101 1.39573
race_black | .5966791 .737698 0.81 0.419 -.8491824 2.042541
race_other | 0 (omitted)
------------------------------------------------------------------------------

.

Can't you gather together attributes a-d in one catergorical variable and go -i.attribute- in the right-hand side of your regerssion equation?

Thanks Carlo. I am confused on how to gather all attributes under one categorical variable. Let's say attribute a has value 1 and 2, b values 1 and 2 and 3, and so on. So may I ask how do I compile under one categorical variable?

Comment

Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#4

15 Sep 2023, 02:51

Vince:
if the predictors show no collinearity problem, you can plug them all in the right-hand side of your regression equation.
That said, if your 1,2,3,..n in your variables mean diferent level of a given attribute, you may want to try:

Code:

clogit choice i.a i.b i.c i.d i.e, group(gid) cluster (ID)

and see what happens.

Kind regards,
Carlo
(Stata 19.0)
Comment
Vince Vo

Join Date: Jan 2020

Posts: 19
#5

15 Sep 2023, 03:14

Originally posted by Carlo Lazzaro View Post

Vince:
if the predictors show no collinearity problem, you can plug them all in the right-hand side of your regression equation.
That said, if your 1,2,3,..n in your variables mean diferent level of a given attribute, you may want to try:

Code:

clogit choice i.a i.b i.c i.d i.e, group(gid) cluster (ID)

and see what happens.

Thanks Carlo. The results I got is that one level of each attribute has been missing which is the reference for that attribute. What I am looking for is actually just attribute reported and one attribute will be omitted, then I can see how respondents looked at across-attribute importance. Or that is impossible? I remember that I read some papers they do include attribute only, but I might be wrong.

Much appreciated Carlo!
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17730
#6

15 Sep 2023, 03:30

Vince:
the results you obtained are as expected.
Unfortunately, they do not seem to fullfil your research goals.
I do not think that what you're after can be obtained (but I might be wrong).
That said, you may want to consider different models with different specification in the right hand-side of your regression equation.

Kind regards,
Carlo
(Stata 19.0)
1 like
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#7

15 Sep 2023, 03:35

Vince Vo: I don't quite follow what it is that you have in mind when you write "What I am looking for is actually just attribute reported and one attribute will be omitted, then I can see how respondents looked at across-attribute importance." To facilitate our conversation, let's suppose that we're looking at a choice between two delivery pizzas, pizza A and pizza B. Suppose further that each pizza is described by two attributes, say price (which has three levels $8, $10, or $12) and delivery time (which also has three levels, say 20 minutes, 40 minutes, or 60 minutes). In this context, could you explain what parameters you're interested in estimating?
Comment
Vince Vo

Join Date: Jan 2020

Posts: 19
#8

15 Sep 2023, 03:43

Originally posted by Hong Il Yoo View Post

Vince Vo: I don't quite follow what it is that you have in mind when you write "What I am looking for is actually just attribute reported and one attribute will be omitted, then I can see how respondents looked at across-attribute importance." To facilitate our conversation, let's suppose that we're looking at a choice between two delivery pizzas, pizza A and pizza B. Suppose further that each pizza is described by two attributes, say price (which has three levels $8, $10, or $12) and delivery time (which also has three levels, say 20 minutes, 40 minutes, or 60 minutes). In this context, could you explain what parameters you're interested in estimating?

Thanks Hong for making it easier to explain. I'd like to see between price and delivery time, which attribute respondents would value more (ie which one has higher preference rate than the other one).
1 like
Comment
Vince Vo

Join Date: Jan 2020

Posts: 19
#9

15 Sep 2023, 03:48

Originally posted by Carlo Lazzaro View Post

Vince:
the results you obtained are as expected.
Unfortunately, they do not seem to fullfil your research goals.
I do not think that what you're after can be obtained (but I might be wrong).
That said, you may want to consider different models with different specification in the right hand-side of your regression equation.

I think you are right. Years ago when I worked on a choice experiment dataset, I have the same issue and ended up with the solution you suggested above.
Comment
Hong Il Yoo

Join Date: Jan 2015

Posts: 292
#10

15 Sep 2023, 05:01

Originally posted by Vince Vo View Post

Thanks Hong for making it easier to explain. I'd like to see between price and delivery time, which attribute respondents would value more (ie which one has higher preference rate than the other one).

That's a tricky issue because without further assumptions the only thing that we can identify from choice models is the effects of level changes within an attribute. You may want to consult this helpful review article:

Gonzalez, J.M. A Guide to Measuring and Interpreting Attribute Importance. Patient 12, 287–295 (2019). https://doi.org/10.1007/s40271-019-00360-3

and see if any of the importance measures suits your requirements. As you'll see, there's no single way to define and measure the notion of importance.
2 likes
Comment

Announcement