Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • one category omitted because of collinearity in fixed effect model

    Dear all,

    I am running a linear probability model with fixed effect using a panel data set. My dependent variable is the mother's labor outcome (dummy: work is coded as 1, otherwise 0), and key independent variable is the child's education stage (categorial: no children-reference group, baby, pre-school, primary school, and secondary school). Moreover, I add the time fixed effect (waven means survey wave).

    My code is as follows:
    Code:
    xtset desampleid waven
    xtreg work i.c_stage i.waven, fe
    However, Stata always says that "note: 4.c_stage omitted because of collinearity", even I removed variable "waven". But when I switched to random effect model, 4.c_stage would not be omitted. I have searched on the forum, but it seems that my case is unique since only one category was omitted. I am wondering why.

    Your answers and suggestions would be greatly appreciated. Thank you very much!

  • #2
    It looks like it is just removing one category of the variable "waven" (the fourth category). This makes sense to avoid the dummy variable trap, i.e. it isn't possible to control for all possibilities of a categorical variable. The omitted category is the baseline.

    To consider a simpler example, where you have a dummy variable for gender (assume male or female) and you regress depvar on i.gender. In this case, Stata will omit one of the categories (say, male) such that the baseline is male, and the coefficient on gender (female) tells you the difference in depvar for a woman.

    You can change which categorical variable Stata omits by using
    Code:
    ib2.waven
    to select the second category (change as required).

    Best,
    Rhys

    Comment


    • #3
      Originally posted by Rhys Williams View Post
      It looks like it is just removing one category of the variable "waven" (the fourth category). This makes sense to avoid the dummy variable trap, i.e. it isn't possible to control for all possibilities of a categorical variable. The omitted category is the baseline.

      To consider a simpler example, where you have a dummy variable for gender (assume male or female) and you regress depvar on i.gender. In this case, Stata will omit one of the categories (say, male) such that the baseline is male, and the coefficient on gender (female) tells you the difference in depvar for a woman.

      You can change which categorical variable Stata omits by using
      Code:
      ib2.waven
      to select the second category (change as required).

      Best,
      Rhys
      Hi Rhys,

      Thanks for your reply!
      But it seem not my case. For the key independent variable "c_stage", "no children" is the baseline. In addition to baseline, another category "secondary school"(4.c_stage) is also omitted even I removed the time fixed effect "waven".

      Here is the result (sorry for the ugly format):
      Code:
      . xtreg work i.c_stage ib2.waven, fe
      note: 4.c_stage omitted because of collinearity
      
      Fixed-effects (within) regression               Number of obs     =      1,512
      Group variable: desampleid                      Number of groups  =        599
      
      R-sq:                                           Obs per group:
           within  = 0.0167                                         min =          2
           between = 0.0343                                      avg =        2.5
           overall = 0.0203                                         max =          4
      
                                                                   F(7,906)          =       2.20
      corr(u_i, Xb)  = 0.0695                         Prob > F          =     0.0319
      
      ------------------------------------------------------------------------------
                  work |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
      ----------------+----------------------------------------------------------------
             c_stage |
                Baby  |  -.0127687   .0930135    -0.14   0.891    -.1953156    .1697782
       Pre School  |  -.0404686   .0724963    -0.56   0.577    -.1827487    .1018116
      Primary S..   |  -.0721042   .0438262    -1.65   0.100    -.1581169    .0139086
      Secondary..  |          0  (omitted)
                   |
                waven |
                       1  |  -.0420507   .0251848    -1.67   0.095     -.091478    .0073766
                       3  |  -.0281702   .0444936    -0.63   0.527    -.1154926    .0591523
                       4  |  -.0210477   .0285602    -0.74   0.461    -.0770997    .0350042
                       5  |   .0425673   .0376368     1.13   0.258    -.0312982    .1164328
                           |
                 _cons |    .650658   .0372689    17.46   0.000     .5775146    .7238014
      -------------+----------------------------------------------------------------
             sigma_u |  .42134789
             sigma_e |  .31526713
                      rho |  .64108513   (fraction of variance due to u_i)
      ------------------------------------------------------------------------------
      F test that all u_i=0: F(598, 906) = 4.40                    Prob > F = 0.0000
      Thanks,
      Jiao

      Comment


      • #4
        Originally posted by Jiao Guo View Post

        Hi Rhys,

        Thanks for your reply!
        But it seem not my case. For the key independent variable "c_stage", "no children" is the baseline. In addition to baseline, another category "secondary school"(4.c_stage) is also omitted even I removed the time fixed effect "waven".

        Here is the result (sorry for the ugly format):
        Code:
        . xtreg work i.c_stage ib2.waven, fe
        note: 4.c_stage omitted because of collinearity
        
        Fixed-effects (within) regression Number of obs = 1,512
        Group variable: desampleid Number of groups = 599
        
        R-sq: Obs per group:
        within = 0.0167 min = 2
        between = 0.0343 avg = 2.5
        overall = 0.0203 max = 4
        
        F(7,906) = 2.20
        corr(u_i, Xb) = 0.0695 Prob > F = 0.0319
        
        ------------------------------------------------------------------------------
        work | Coef. Std. Err. t P>|t| [95% Conf. Interval]
        ----------------+----------------------------------------------------------------
        c_stage |
        Baby | -.0127687 .0930135 -0.14 0.891 -.1953156 .1697782
        Pre School | -.0404686 .0724963 -0.56 0.577 -.1827487 .1018116
        Primary S.. | -.0721042 .0438262 -1.65 0.100 -.1581169 .0139086
        Secondary.. | 0 (omitted)
        |
        waven |
        1 | -.0420507 .0251848 -1.67 0.095 -.091478 .0073766
        3 | -.0281702 .0444936 -0.63 0.527 -.1154926 .0591523
        4 | -.0210477 .0285602 -0.74 0.461 -.0770997 .0350042
        5 | .0425673 .0376368 1.13 0.258 -.0312982 .1164328
        |
        _cons | .650658 .0372689 17.46 0.000 .5775146 .7238014
        -------------+----------------------------------------------------------------
        sigma_u | .42134789
        sigma_e | .31526713
        rho | .64108513 (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(598, 906) = 4.40 Prob > F = 0.0000
        Thanks,
        Jiao
        Hi Jiao,

        Is there unique variation in this fourth category? Or is there a perfect linear relationship between the fourth and the other categories?

        Best,
        Rhys

        Comment


        • #5
          Originally posted by Rhys Williams View Post

          Hi Jiao,

          Is there unique variation in this fourth category? Or is there a perfect linear relationship between the fourth and the other categories?

          Best,
          Rhys
          Hi Rhys,

          I am not sure what "a perfect linear relationship between the fourth and the other categories" means. Could you please explain more? Thank you!

          Btw, I might find what the problem is in my data set. Using command xttab, I found that, for my reference group(no child), 342 respondents who didn't have children in at least one of her observations(see Between column), 100% of their observations didn't have children (see Within column). Does this mean no variation? Anyway, after I deleted the reference group, there is no omitted category anymore.

          Code:
          . xttab c_stage
          
                            Overall             Between            Within
             c_stage |    Freq.  Percent      Freq.  Percent        Percent
          ----------+-----------------------------------------------------
              No child |     342     22.62       140     23.37         100.00
                  Baby |      91      6.02        75     12.52          47.78
            Pre Scho |     318     21.03       191     31.89          65.45
             Primary  |     389     25.73       247     41.24          58.94
           Secondar |     372     24.60       206     34.39          74.07
          ----------+-----------------------------------------------------
              Total |    1512    100.00       859    143.41          69.73
                                         (n = 599)
          Looking forward to more discussion!

          Thanks,
          Jiao

          Comment


          • #6
            Hi Jiao,

            Yes, that's the problem - there was no variation in "no child" and so this wasn't the baseline group, but instead secondary was.

            Glad you identified the issue.

            Best,
            Rhys

            Comment

            Working...
            X