Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Logistic regression and Error: Factor variable base category conflict

    Hello all,

    I'm running a (simple) hierarchical logistic regression model using two steps: the main effects on the first step and the interaction on the second step where x1 is a dichotomous variable and x2 is a categorical variable with three categories (cat0, cat1, cat2).

    I believe there are two approaches to run the model. The first:

    Code:
    nestreg, lr: logistic y (x1 i.x2) (x1#i.x2)
    Which has an output:

    Code:
    nestreg, lr: logistic y (x1 i.x2) (x1#i.x2)
    note: 0.x2 omitted because of estimability.
    note: 0.x1#0.x2 omitted because of estimability.
    note: 0.x1#1.x2 omitted because of estimability.
    note: 0.x1#2.x2 omitted because of estimability.
    note: 1.x1#2.x2 omitted because of estimability.
    
    Block 1: x1 1.x2 2.x2
    
    Logistic regression                                     Number of obs =    400
                                                            LR chi2(3)    =  29.21
                                                            Prob > chi2   = 0.0000
    Log likelihood = -247.88975                             Pseudo R2     = 0.0556
    
    ------------------------------------------------------------------------------
               y | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    -------------+----------------------------------------------------------------
              x1 |   1.754175   .3795349     2.60   0.009     1.147905    2.680647
                 |
              x2 |
         x2cat1  |   3.382582   .9389867     4.39   0.000     1.963177    5.828239
         x2cat2  |   2.701352   .7499701     3.58   0.000     1.567704     4.65477
                 |
           _cons |   .1963767   .0488368    -6.55   0.000     .1206161    .3197237
    ------------------------------------------------------------------------------
    Note: _cons estimates baseline odds.
    
    Block 2: 1.x1#0b.x2 1.x1#1.x2
    
    Logistic regression                                     Number of obs =    400
                                                            LR chi2(5)    =  29.66
                                                            Prob > chi2   = 0.0000
    Log likelihood = -247.66862                             Pseudo R2     = 0.0565
    
    --------------------------------------------------------------------------------
                 y | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    ---------------+----------------------------------------------------------------
                x1 |   2.045455   .7275044     2.01   0.044     1.018695    4.107104
                   |
                x2 |
           x2cat1  |          3   1.222125     2.70   0.007     1.350091    6.666218
           x2cat2  |        2.2   .9058248     1.91   0.055     .9816354    4.930548
                   |
             x1#x2 |
    x1cat1#x2cat0  |   .6901961   .3845699    -0.67   0.506     .2315752    2.057088
    x1cat1#x2cat1  |   .8516129     .42635    -0.32   0.748     .3192259    2.271885
                   |
             _cons |   .2222222   .0709205    -4.71   0.000     .1188866    .4153765
    --------------------------------------------------------------------------------
    Where x2cat0 is the base level in Block 1 (and should remain as such). However, Block 2 (interaction) uses x2cat2 as base level. When I try to specify otherwise:

    Code:
    nestreg, lr: logistic y (x1 i.x2) (x1#ib0.x2)
    It doesn't change anything. x2cat2 is still base level in Block 2. If I specify x2cat1 or x2cat2 then I get the error:

    Code:
    x2: factor variable base category conflict
    So, I'm assuming I'm simply reading the model incorrectly. That this output is, in fact, using cat0 as the base level in Block 2. If this is true, can you explain how I should interpret this output?

    The second approach:

    From what I understand:

    Code:
    nestreg, lr: logistic y x1##x2
    Might be a possible solution, and has a final output:

    Code:
    Block 5: 1.x1#2.x2
    
    Logistic regression                                     Number of obs =    400
                                                            LR chi2(5)    =  29.66
                                                            Prob > chi2   = 0.0000
    Log likelihood = -247.66862                             Pseudo R2     = 0.0565
    
    --------------------------------------------------------------------------------
                 y | Odds ratio   Std. err.      z    P>|z|     [95% conf. interval]
    ---------------+----------------------------------------------------------------
                x1 |
           x1cat1  |   1.411765   .6055133     0.80   0.421     .6090845    3.272255
                   |
                x2 |
           x2cat1  |          3   1.222125     2.70   0.007     1.350091    6.666218
           x2cat2  |        2.2   .9058248     1.91   0.055     .9816354    4.930548
                   |
             x1#x2 |
    x1cat1#x2cat1  |   1.233871   .6848796     0.38   0.705     .4157161    3.662205
    x1cat1#x2cat2  |   1.448864   .8072914     0.67   0.506      .486124    4.318252
                   |
             _cons |   .2222222   .0709205    -4.71   0.000     .1188866    .4153765
    --------------------------------------------------------------------------------
    This uses x2cat0 as base level throughout the model, however the output values (odds ratio, std. err., z, p, etc.) for x1cat1 are much different compared to the first model. If both models are using x1cat0 as base reference, then shouldn't this be the same? It makes me believe that only one of these two models are appropriate here, but I'm not sure which one.

    What do you think?

    Thanks!

  • #2
    I think you may need to specify the reference category in both places. -nestreg- hides the fact that you are carrying out two separate regressions here and I think it does not know how to tell Stata to carry the choices made in the first regression forward into the second. The following works:
    Code:
    webuse lbw, clear
    
    nestreg, lr: logistic low (ib2.race i.smoke) (ib2.race#i.smoke)
    and forces Stata to use category 2 (Black) for race as the reference in both regressions.

    Comment


    • #3
      Originally posted by Clyde Schechter View Post
      I think you may need to specify the reference category in both places. -nestreg- hides the fact that you are carrying out two separate regressions here and I think it does not know how to tell Stata to carry the choices made in the first regression forward into the second. The following works:
      Code:
      webuse lbw, clear
      
      nestreg, lr: logistic low (ib2.race i.smoke) (ib2.race#i.smoke)
      and forces Stata to use category 2 (Black) for race as the reference in both regressions.
      This worked. Thank you!

      Comment

      Working...
      X