Why are the variables omitted?

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17724

#16

24 Jun 2022, 00:27

Carl:
replicating your codes (with some tweaks to uniform to the way predictors were named in -nlswork.dta- file we have:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)


. reg ln_wage union grade tenure i.year, robust

Linear regression                               Number of obs     =     19,008
                                                F(14, 18993)      =     616.93
                                                Prob > F          =     0.0000
                                                R-squared         =     0.3039
                                                Root MSE          =     .38985

------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       union |   .1514804     .00672    22.54   0.000     .1383087    .1646522
       grade |   .0790582   .0012815    61.69   0.000     .0765464      .08157
      tenure |   .0300961   .0007711    39.03   0.000     .0285846    .0316075
             |
        year |
         71  |   .0274538   .0161976     1.69   0.090    -.0042949    .0592025
         72  |   .0292378   .0152852     1.91   0.056    -.0007225    .0591981
         73  |   .0166856   .0159472     1.05   0.295    -.0145722    .0479435
         77  |  -.0133802   .0144448    -0.93   0.354    -.0416934     .014933
         78  |   .0456363   .0152336     3.00   0.003     .0157771    .0754955
         80  |   .0036626   .0151359     0.24   0.809    -.0260052    .0333304
         82  |  -.0103459   .0148349    -0.70   0.486    -.0394237    .0187319
         83  |   .0099641   .0156993     0.63   0.526    -.0208079     .040736
         85  |   .0412243   .0154251     2.67   0.008     .0109898    .0714588
         87  |   .0410003   .0155188     2.64   0.008     .0105822    .0714185
         88  |    .048031   .0163018     2.95   0.003      .016078    .0799841
             |
       _cons |   .5723746   .0195216    29.32   0.000     .5341106    .6106386
------------------------------------------------------------------------------

. xtreg ln_wage union grade tenure i.year, fe robust
note: grade omitted because of collinearity.

Fixed-effects (within) regression               Number of obs     =     19,008
Group variable: idcode                          Number of groups  =      4,132

R-squared:                                      Obs per group:
     Within  = 0.1282                                         min =          1
     Between = 0.1610                                         avg =        4.6
     Overall = 0.1340                                         max =         12

                                                F(13,4131)        =      97.42
corr(u_i, Xb) = 0.1429                          Prob > F          =     0.0000

                             (Std. err. adjusted for 4,132 clusters in idcode)
------------------------------------------------------------------------------
             |               Robust
     ln_wage | Coefficient  std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
       union |   .1004918    .009703    10.36   0.000     .0814687    .1195149
       grade |          0  (omitted)
      tenure |   .0172592   .0011662    14.80   0.000     .0149727    .0195456
             |
        year |
         71  |   .0257666   .0106191     2.43   0.015     .0049475    .0465858
         72  |   .0286456   .0120102     2.39   0.017     .0050991    .0521921
         73  |   .0279679   .0132397     2.11   0.035     .0020109    .0539249
         77  |   .0556208   .0144188     3.86   0.000     .0273522    .0838894
         78  |   .0936785   .0149516     6.27   0.000     .0643652    .1229918
         80  |   .0773018   .0154508     5.00   0.000     .0470099    .1075937
         82  |   .0906583   .0156842     5.78   0.000     .0599089    .1214077
         83  |   .1130978   .0160829     7.03   0.000     .0815667    .1446289
         85  |   .1470453   .0164796     8.92   0.000     .1147363    .1793542
         87  |    .166594   .0175077     9.52   0.000     .1322694    .2009186
         88  |   .1921114     .01866    10.30   0.000     .1555279     .228695
             |
       _cons |   1.566306   .0121762   128.64   0.000     1.542434    1.590178
-------------+----------------------------------------------------------------
     sigma_u |   .4055671
     sigma_e |  .25625658
         rho |  .71467812   (fraction of variance due to u_i)
------------------------------------------------------------------------------

.

Some commennts on what aboce follow;
1) OLS code: it considers all the observations as independent. Standard errors (SEs) take heteroskedasticity only into account. No demeaning is applied.
2) -xtreg,fe- code: it considers the panel structure of your dataset. SEs take both heteroskedasticity and serial correlation into account (while -robust- and -vce(cluster idcode)- can be used interchangeably with -xtreg-, this does not hold for -regress-). Demeaning is applied; therefore, the mean of a constant (that is, a time-invariant variable, such as -race-) equals the constant and the subtraction sums up to 0 (ie, no coefficient is returned).

Kind regards,
Carlo
(Stata 19.0)

Comment

Carl Baier

Join Date: Jun 2022

Posts: 13
#17

24 Jun 2022, 11:51

Hi Carlo, thanks. I just found the following. In the book "Introductery Econometrics - 6th Edition" by Wooldridge, I found the following on page 437:

"When we include a full set of year dummies—that is, year dummies for all years but the first—

we cannot estimate the effect of any variable whose change across time is constant. An example is

years of experience in a panel data set where each person works in every year, so that experience

always increases by one in each year, for every person in the sample. The presence of ai accounts for

differences across people in their years of experience in the initial time period. But then the effect of

a one-year increase in experience cannot be distinguished from the aggregate time effects (because

experience increases by the same amount for everyone).
"

Could this be the explanation why one of the variables is removed in the FE estimator? Is there then multicolinearity in keeping all (8) variables in the model? And to get around that one variable is removed or? (Wooldridge used all year dummy variables, which is why experience drops out for im and 1987 for me).
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#18

24 Jun 2022, 11:57

Carl:
you had two different omissions in your first -xtreg,fe- code:
1)-edu-, as it was time-invariant;
2) 1987 in -i.year- (BTW: this omision was decided by Stata, but you can manage it using the -ib#.- prefix from -fvvarlist-) notation was omitted to protect your analysis from the so called "dummy-trap" (https://en.wikipedia.org/wiki/Dummy_...le_(statistics)).

Kind regards,
Carlo
(Stata 19.0)
Comment
Carl Baier

Join Date: Jun 2022

Posts: 13
#19

24 Jun 2022, 12:05

Originally posted by Carlo Lazzaro View Post

Carl:
you had two different omissions in your first -xtreg,fe- code:
1)-edu-, as it was time-invariant;
2) 1987 in -i.year- (BTW: this omision was decided by Stata, but you can manage it using the -ib#.- prefix from -fvvarlist-) notation was omitted to protect your analysis from the so called "dummy-trap" (https://en.wikipedia.org/wiki/Dummy_...le_(statistics)).

Thanks four your answer. I get all your points, but the year 1980 was removed to prevent the dummy-trap. And if 1987 had been removed because of the dummy variable trap, shouldn't it have already been removed in a "normal" regression? But i dont think that 1987 was. Dont you think that wooldridge's explanation would be logical?
Sorry to be so pushy, but I still don't quite get it why two year dummies were removed.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17724
#20

25 Jun 2022, 02:40

Carl:
as far as the omission of 1987 in your second code (ie, when -exp- was plugged in the right-hand side of your regression equation) is concerned, Jeff's explanation makes sense, as 1987 was in all likelihood perfectly collinear with -exp-.
Therefore, in your secon code, the omission of -1980- was due to avoid the dummy trap, whereas the omission of -1987- was due to perfect collinear with -exp-.

Kind regards,
Carlo
(Stata 19.0)
Comment

Announcement

Comment

Comment

Comment

Comment

Comment