Logistic regression and Error: Factor variable base category conflict

Michael Hojo

Join Date: May 2022
Posts: 3

Logistic regression and Error: Factor variable base category conflict

25 May 2022, 11:10

Hello all,

I'm running a (simple) hierarchical logistic regression model using two steps: the main effects on the first step and the interaction on the second step where x1 is a dichotomous variable and x2 is a categorical variable with three categories (cat0, cat1, cat2).

I believe there are two approaches to run the model. The first:

Code:

nestreg, lr: logistic y (x1 i.x2) (x1#i.x2)

Which has an output:

Code:

nestreg, lr: logistic y (x1 i.x2) (x1#i.x2)
note: 0.x2 omitted because of estimability.
note: 0.x1#0.x2 omitted because of estimability.
note: 0.x1#1.x2 omitted because of estimability.
note: 0.x1#2.x2 omitted because of estimability.
note: 1.x1#2.x2 omitted because of estimability.

Block 1: x1 1.x2 2.x2

Logistic regression                                     Number of obs =    400
                                                        LR chi2(3)    =  29.21
                                                        Prob > chi2   = 0.0000
Log likelihood = -247.88975                             Pseudo R2     = 0.0556

------------------------------------------------------------------------------
           y | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
-------------+----------------------------------------------------------------
          x1 |   1.754175   .3795349     2.60   0.009     1.147905    2.680647
             |
          x2 |
     x2cat1  |   3.382582   .9389867     4.39   0.000     1.963177    5.828239
     x2cat2  |   2.701352   .7499701     3.58   0.000     1.567704     4.65477
             |
       _cons |   .1963767   .0488368    -6.55   0.000     .1206161    .3197237
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

Block 2: 1.x1#0b.x2 1.x1#1.x2

Logistic regression                                     Number of obs =    400
                                                        LR chi2(5)    =  29.66
                                                        Prob > chi2   = 0.0000
Log likelihood = -247.66862                             Pseudo R2     = 0.0565

--------------------------------------------------------------------------------
             y | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
---------------+----------------------------------------------------------------
            x1 |   2.045455   .7275044     2.01   0.044     1.018695    4.107104
               |
            x2 |
       x2cat1  |          3   1.222125     2.70   0.007     1.350091    6.666218
       x2cat2  |        2.2   .9058248     1.91   0.055     .9816354    4.930548
               |
         x1#x2 |
x1cat1#x2cat0  |   .6901961   .3845699    -0.67   0.506     .2315752    2.057088
x1cat1#x2cat1  |   .8516129     .42635    -0.32   0.748     .3192259    2.271885
               |
         _cons |   .2222222   .0709205    -4.71   0.000     .1188866    .4153765
--------------------------------------------------------------------------------

Where x2cat0 is the base level in Block 1 (and should remain as such). However, Block 2 (interaction) uses x2cat2 as base level. When I try to specify otherwise:

Code:

nestreg, lr: logistic y (x1 i.x2) (x1#ib0.x2)

It doesn't change anything. x2cat2 is still base level in Block 2. If I specify x2cat1 or x2cat2 then I get the error:

Code:

x2: factor variable base category conflict

So, I'm assuming I'm simply reading the model incorrectly. That this output is, in fact, using cat0 as the base level in Block 2. If this is true, can you explain how I should interpret this output?

The second approach:

From what I understand:

Code:

nestreg, lr: logistic y x1##x2

Might be a possible solution, and has a final output:

Code:

Block 5: 1.x1#2.x2

Logistic regression                                     Number of obs =    400
                                                        LR chi2(5)    =  29.66
                                                        Prob > chi2   = 0.0000
Log likelihood = -247.66862                             Pseudo R2     = 0.0565

--------------------------------------------------------------------------------
             y | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
---------------+----------------------------------------------------------------
            x1 |
       x1cat1  |   1.411765   .6055133     0.80   0.421     .6090845    3.272255
               |
            x2 |
       x2cat1  |          3   1.222125     2.70   0.007     1.350091    6.666218
       x2cat2  |        2.2   .9058248     1.91   0.055     .9816354    4.930548
               |
         x1#x2 |
x1cat1#x2cat1  |   1.233871   .6848796     0.38   0.705     .4157161    3.662205
x1cat1#x2cat2  |   1.448864   .8072914     0.67   0.506      .486124    4.318252
               |
         _cons |   .2222222   .0709205    -4.71   0.000     .1188866    .4153765
--------------------------------------------------------------------------------

This uses x2cat0 as base level throughout the model, however the output values (odds ratio, std. err., z, p, etc.) for x1cat1 are much different compared to the first model. If both models are using x1cat0 as base reference, then shouldn't this be the same? It makes me believe that only one of these two models are appropriate here, but I'm not sure which one.

What do you think?

Thanks!

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30356
#2

25 May 2022, 12:04

I think you may need to specify the reference category in both places. -nestreg- hides the fact that you are carrying out two separate regressions here and I think it does not know how to tell Stata to carry the choices made in the first regression forward into the second. The following works:

Code:

webuse lbw, clear nestreg, lr: logistic low (ib2.race i.smoke) (ib2.race#i.smoke)

and forces Stata to use category 2 (Black) for race as the reference in both regressions.
1 like
Comment
Michael Hojo

Join Date: May 2022

Posts: 3
#3

02 Jun 2022, 20:40

Originally posted by Clyde Schechter View Post

I think you may need to specify the reference category in both places. -nestreg- hides the fact that you are carrying out two separate regressions here and I think it does not know how to tell Stata to carry the choices made in the first regression forward into the second. The following works:

Code:

webuse lbw, clear nestreg, lr: logistic low (ib2.race i.smoke) (ib2.race#i.smoke)

and forces Stata to use category 2 (Black) for race as the reference in both regressions.

This worked. Thank you!
Comment

Announcement

Logistic regression and Error: Factor variable base category conflict

Comment

Comment