one category omitted because of collinearity in fixed effect model

Jiao Guo

Join Date: Jun 2020

Posts: 3
#1

one category omitted because of collinearity in fixed effect model

16 Mar 2021, 08:45

Dear all,

I am running a linear probability model with fixed effect using a panel data set. My dependent variable is the mother's labor outcome (dummy: work is coded as 1, otherwise 0), and key independent variable is the child's education stage (categorial: no children-reference group, baby, pre-school, primary school, and secondary school). Moreover, I add the time fixed effect (waven means survey wave).

My code is as follows:

Code:

xtset desampleid waven xtreg work i.c_stage i.waven, fe

However, Stata always says that "note: 4.c_stage omitted because of collinearity", even I removed variable "waven". But when I switched to random effect model, 4.c_stage would not be omitted. I have searched on the forum, but it seems that my case is unique since only one category was omitted. I am wondering why.

Your answers and suggestions would be greatly appreciated. Thank you very much!
Tags: categorical, fixed effects, panel data, Time Series
Rhys Williams

Join Date: Apr 2020

Posts: 224
#2

16 Mar 2021, 08:52

It looks like it is just removing one category of the variable "waven" (the fourth category). This makes sense to avoid the dummy variable trap, i.e. it isn't possible to control for all possibilities of a categorical variable. The omitted category is the baseline.

To consider a simpler example, where you have a dummy variable for gender (assume male or female) and you regress depvar on i.gender. In this case, Stata will omit one of the categories (say, male) such that the baseline is male, and the coefficient on gender (female) tells you the difference in depvar for a woman.

You can change which categorical variable Stata omits by using

Code:

ib2.waven

to select the second category (change as required).

Best,
Rhys
Comment

Jiao Guo

Join Date: Jun 2020
Posts: 3

16 Mar 2021, 09:04

Originally posted by Rhys Williams View Post

It looks like it is just removing one category of the variable "waven" (the fourth category). This makes sense to avoid the dummy variable trap, i.e. it isn't possible to control for all possibilities of a categorical variable. The omitted category is the baseline.

To consider a simpler example, where you have a dummy variable for gender (assume male or female) and you regress depvar on i.gender. In this case, Stata will omit one of the categories (say, male) such that the baseline is male, and the coefficient on gender (female) tells you the difference in depvar for a woman.

You can change which categorical variable Stata omits by using

Code:

ib2.waven

to select the second category (change as required).

Best,
Rhys

Hi Rhys,

Thanks for your reply!
But it seem not my case. For the key independent variable "c_stage", "no children" is the baseline. In addition to baseline, another category "secondary school"(4.c_stage) is also omitted even I removed the time fixed effect "waven".

Here is the result (sorry for the ugly format):

Code:

. xtreg work i.c_stage ib2.waven, fe
note: 4.c_stage omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =      1,512
Group variable: desampleid                      Number of groups  =        599

R-sq:                                           Obs per group:
     within  = 0.0167                                         min =          2
     between = 0.0343                                      avg =        2.5
     overall = 0.0203                                         max =          4

                                                             F(7,906)          =       2.20
corr(u_i, Xb)  = 0.0695                         Prob > F          =     0.0319

------------------------------------------------------------------------------
            work |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
       c_stage |
          Baby  |  -.0127687   .0930135    -0.14   0.891    -.1953156    .1697782
 Pre School  |  -.0404686   .0724963    -0.56   0.577    -.1827487    .1018116
Primary S..   |  -.0721042   .0438262    -1.65   0.100    -.1581169    .0139086
Secondary..  |          0  (omitted)
             |
          waven |
                 1  |  -.0420507   .0251848    -1.67   0.095     -.091478    .0073766
                 3  |  -.0281702   .0444936    -0.63   0.527    -.1154926    .0591523
                 4  |  -.0210477   .0285602    -0.74   0.461    -.0770997    .0350042
                 5  |   .0425673   .0376368     1.13   0.258    -.0312982    .1164328
                     |
           _cons |    .650658   .0372689    17.46   0.000     .5775146    .7238014
-------------+----------------------------------------------------------------
       sigma_u |  .42134789
       sigma_e |  .31526713
                rho |  .64108513   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(598, 906) = 4.40                    Prob > F = 0.0000

Thanks,
Jiao

Comment

Rhys Williams

Join Date: Apr 2020
Posts: 224

16 Mar 2021, 09:11

Originally posted by Jiao Guo View Post

Code:

. xtreg work i.c_stage ib2.waven, fe
note: 4.c_stage omitted because of collinearity

Fixed-effects (within) regression Number of obs = 1,512
Group variable: desampleid Number of groups = 599

R-sq: Obs per group:
within = 0.0167 min = 2
between = 0.0343 avg = 2.5
overall = 0.0203 max = 4

F(7,906) = 2.20
corr(u_i, Xb) = 0.0695 Prob > F = 0.0319

------------------------------------------------------------------------------
work | Coef. Std. Err. t P>|t| [95% Conf. Interval]
----------------+----------------------------------------------------------------
c_stage |
Baby | -.0127687 .0930135 -0.14 0.891 -.1953156 .1697782
Pre School | -.0404686 .0724963 -0.56 0.577 -.1827487 .1018116
Primary S.. | -.0721042 .0438262 -1.65 0.100 -.1581169 .0139086
Secondary.. | 0 (omitted)
|
waven |
1 | -.0420507 .0251848 -1.67 0.095 -.091478 .0073766
3 | -.0281702 .0444936 -0.63 0.527 -.1154926 .0591523
4 | -.0210477 .0285602 -0.74 0.461 -.0770997 .0350042
5 | .0425673 .0376368 1.13 0.258 -.0312982 .1164328
|
_cons | .650658 .0372689 17.46 0.000 .5775146 .7238014
-------------+----------------------------------------------------------------
sigma_u | .42134789
sigma_e | .31526713
rho | .64108513 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(598, 906) = 4.40 Prob > F = 0.0000

Thanks,
Jiao

Hi Jiao,

Is there unique variation in this fourth category? Or is there a perfect linear relationship between the fourth and the other categories?

Best,
Rhys

Comment

Jiao Guo

Join Date: Jun 2020

Posts: 3
#5

16 Mar 2021, 09:51

Originally posted by Rhys Williams View Post

Hi Jiao,

Is there unique variation in this fourth category? Or is there a perfect linear relationship between the fourth and the other categories?

Best,
Rhys

Hi Rhys,

I am not sure what "a perfect linear relationship between the fourth and the other categories" means. Could you please explain more? Thank you!

Btw, I might find what the problem is in my data set. Using command xttab, I found that, for my reference group(no child), 342 respondents who didn't have children in at least one of her observations(see Between column), 100% of their observations didn't have children (see Within column). Does this mean no variation? Anyway, after I deleted the reference group, there is no omitted category anymore.

Code:

. xttab c_stage Overall Between Within c_stage | Freq. Percent Freq. Percent Percent ----------+----------------------------------------------------- No child | 342 22.62 140 23.37 100.00 Baby | 91 6.02 75 12.52 47.78 Pre Scho | 318 21.03 191 31.89 65.45 Primary | 389 25.73 247 41.24 58.94 Secondar | 372 24.60 206 34.39 74.07 ----------+----------------------------------------------------- Total | 1512 100.00 859 143.41 69.73 (n = 599)

Looking forward to more discussion!

Thanks,
Jiao
Comment
Rhys Williams

Join Date: Apr 2020

Posts: 224
#6

16 Mar 2021, 10:23

Hi Jiao,

Yes, that's the problem - there was no variation in "no child" and so this wasn't the baseline group, but instead secondary was.

Glad you identified the issue.

Best,
Rhys
Comment

Announcement