To "xi" or not to "xi"?

River Huang

Join Date: Mar 2016
Posts: 1908

To "xi" or not to "xi"?

02 Mar 2019, 20:23

Dear All, I know that, in general, whether to add prefix such as "xi" does not matter for estimation. However, I was asked a question which adding "xi" or not does alter the results. The data is here:

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input long id double year long industry double(y x)
 2 2004 63  .756728  .8355138047833333
 2 2005 63  .751015  .6496188354333334
 2 2006 63 1.385951  .7327375125666667
 2 2007 63 1.898914        .8224521987
 2 2008 63  .581806      1.79040523525
 2 2009 63  .842814 1.2762391918083331
 2 2010 63  .419548  .9888817151916668
 2 2011 63  .272164 1.7063635349166666
 2 2012 63  .294346 2.4439829506666664
 2 2013 63  .228051 1.1389735984916667
 2 2014 63   .30241     1.236348807075
 2 2015 63  .442573  1.812866477783333
 2 2016 63  .273578  3.648327967416666
 2 2017 63  .294642  3.638739865583334
 4 2004  .        .                  .
 4 2005  .        .                  .
 4 2006  .        .                  .
 4 2007  .        .                  .
 4 2008  .        .                  .
 4 2009  .        .                  .
 4 2010  .        .                  .
 4 2011  .        .                  .
 4 2012  .        .                  .
 4 2013  .        .                  .
 4 2014  .        .                  .
 4 2015  .        .                  .
 4 2016  .        .                  .
 4 2017  .        .                  .
 5 2004  .        .                  .
 5 2005  .        .                  .
 5 2006  .        .                  .
 5 2007  .        .                  .
 5 2008  .        .                  .
 5 2009  .        .                  .
 5 2010  .        .                  .
 5 2011  .        .                  .
 5 2012  .        .                  .
 5 2013  .        .                  .
 5 2014  .        .                  .
 5 2015  .        .                  .
 5 2016  .        .                  .
 5 2017  .        .                  .
 6 2004 63   .29287  .8355138047833333
 6 2005 63  .505632  .6496188354333334
 6 2006 63 1.798777  .7327375125666667
 6 2007 63 1.198202        .8224521987
 6 2008 63  .422264      1.79040523525
 6 2009 63  .790204 1.2762391918083331
 6 2010 63  .618863  .9888817151916668
 6 2011 63  .488526 1.7063635349166666
 6 2012 63  .682689 2.4439829506666664
 6 2013 63  .669052 1.1389735984916667
 6 2014 63  .809775     1.236348807075
 6 2015 63 1.223162  1.812866477783333
 6 2016 63  .968647  3.648327967416666
 6 2017  .        .  3.638739865583334
 7 2004  .        .                  .
 7 2005  .        .                  .
 7 2006  .        .                  .
 7 2007  .        .                  .
 7 2008  .        .                  .
 7 2009  .        .                  .
 7 2010  .        .                  .
 7 2011  .        .                  .
 7 2012  .        .                  .
 7 2013  .        .                  .
 7 2014  .        .                  .
 7 2015  .        .                  .
 7 2016  .        .                  .
 7 2017  .        .                  .
 8 2004  .        .                  .
 8 2005  .        .                  .
 8 2006  .        .                  .
 8 2007  .        .                  .
 8 2008  .        .                  .
 8 2009  .        .                  .
 8 2010  .        .                  .
 8 2011  .        .                  .
 8 2012  .        .                  .
 8 2013  .        .                  .
 8 2014  .        .                  .
 8 2015  .        .                  .
 8 2016  .        .                  .
 8 2017  .        .                  .
 9 2004 79  .676721  .8355138047833333
 9 2005 79  .467796  .6496188354333334
 9 2006 79   .75409  .7327375125666667
 9 2007 79 2.684242        .8224521987
 9 2008 79  .701192      1.79040523525
 9 2009 79 1.664904 1.2762391918083331
 9 2010 79 1.864633  .9888817151916668
 9 2011 79 1.034766 1.7063635349166666
 9 2012 79   .75898 2.4439829506666664
 9 2013 79  .911308 1.1389735984916667
 9 2014 79 1.398672     1.236348807075
 9 2015 79  1.65417  1.812866477783333
 9 2016 79 1.099003  3.648327967416666
 9 2017 79  .610722  3.638739865583334
10 2004  .        .                  .
10 2005  .        .                  .
10 2006  .        .                  .
10 2007  .        .                  .
10 2008  .        .                  .
10 2009  .        .                  .
10 2010  .        .                  .
10 2011  .        .                  .
10 2012  .        .                  .
10 2013  .        .                  .
10 2014  .        .                  .
10 2015  .        .                  .
10 2016  .        .                  .
10 2017  .        .                  .
11 2004  .        .                  .
11 2005  .        .                  .
11 2006  .        .                  .
11 2007  .        .                  .
11 2008  .        .                  .
11 2009  .        .                  .
11 2010  .        .                  .
11 2011  .        .                  .
11 2012  .        .                  .
11 2013  .        .                  .
11 2014  .        .                  .
11 2015  .        .                  .
11 2016  .        .                  .
11 2017  .        .                  .
12 2004 28 1.032094  .8355138047833333
12 2005 28  .702338  .6496188354333334
12 2006 28 1.422744  .7327375125666667
12 2007 28 2.440828        .8224521987
12 2008 28  .811734      1.79040523525
12 2009 28 1.779759 1.2762391918083331
12 2010 28 2.745457  .9888817151916668
12 2011 28 1.053197 1.7063635349166666
12 2012 28  1.05299 2.4439829506666664
12 2013 28 1.019698 1.1389735984916667
12 2014 28 1.054209     1.236348807075
12 2015 28 1.492859  1.812866477783333
12 2016 28 1.211967  3.648327967416666
12 2017 28  .946029  3.638739865583334
14 2004 63  .430889  .8355138047833333
14 2005 63  .594332  .6496188354333334
14 2006 63  .945943  .7327375125666667
14 2007  .        .        .8224521987
14 2008 63  .890433      1.79040523525
14 2009 63 1.794807 1.2762391918083331
14 2010 63  1.16601  .9888817151916668
14 2011 63  .549908 1.7063635349166666
14 2012 63  .846552 2.4439829506666664
14 2013 63 1.093508 1.1389735984916667
14 2014 63 1.244544     1.236348807075
14 2015 63 2.120411  1.812866477783333
14 2016 63  2.23855  3.648327967416666
14 2017 63 1.423093  3.638739865583334
16 2004 37  .306983  .8355138047833333
16 2005 37  .228051  .6496188354333334
16 2006 37  .232354  .7327375125666667
16 2007 37  .494927        .8224521987
16 2008 37   .29909      1.79040523525
16 2009 37  .559564 1.2762391918083331
16 2010 37  .298449  .9888817151916668
16 2011 37  .228051 1.7063635349166666
16 2012 37  .228051 2.4439829506666664
16 2013 37  .257594 1.1389735984916667
16 2014 37   .35575     1.236348807075
16 2015 37  .983892  1.812866477783333
16 2016 37  .546826  3.648327967416666
16 2017 37  .504983  3.638739865583334
17 2004  .        .                  .
17 2005  .        .                  .
17 2006  .        .                  .
17 2007  .        .                  .
17 2008  .        .                  .
17 2009  .        .                  .
17 2010  .        .                  .
17 2011  .        .                  .
17 2012  .        .                  .
17 2013  .        .                  .
17 2014  .        .                  .
17 2015  .        .                  .
17 2016  .        .                  .
17 2017  .        .                  .
18 2004  .        .                  .
18 2005  .        .                  .
18 2006  .        .                  .
18 2007  .        .                  .
18 2008  .        .                  .
18 2009  .        .                  .
18 2010  .        .                  .
18 2011  .        .                  .
18 2012  .        .                  .
18 2013  .        .                  .
18 2014  .        .                  .
18 2015  .        .                  .
18 2016  .        .                  .
18 2017  .        .                  .
19 2004 14 2.209158  .8355138047833333
19 2005 14 1.927555  .6496188354333334
19 2006 14 2.130961  .7327375125666667
19 2007 14 8.873074        .8224521987
19 2008 14 2.068874      1.79040523525
19 2009 14 3.652264 1.2762391918083331
19 2010 14 3.845202  .9888817151916668
19 2011 14 2.133376 1.7063635349166666
19 2012 14 1.965606 2.4439829506666664
19 2013 14 1.616796 1.1389735984916667
19 2014 14 2.345149     1.236348807075
19 2015 14 5.605507  1.812866477783333
19 2016 14 6.972619  3.648327967416666
19 2017  .        .  3.638739865583334
20 2004  .        .                  .
20 2005  .        .                  .
20 2006  .        .                  .
20 2007  .        .                  .
20 2008  .        .                  .
20 2009  .        .                  .
20 2010  .        .                  .
20 2011  .        .                  .
20 2012  .        .                  .
20 2013  .        .                  .
20 2014  .        .                  .
20 2015  .        .                  .
20 2016  .        .                  .
20 2017  .        .                  .
21 2004 37 1.709002  .8355138047833333
21 2005 37 1.373025  .6496188354333334
21 2006 37 1.841413  .7327375125666667
21 2007 37 2.871234        .8224521987
21 2008 37  .843067      1.79040523525
21 2009 37 2.093308 1.2762391918083331
21 2010 37 2.327557  .9888817151916668
21 2011 37  .658607 1.7063635349166666
21 2012 37  .613905 2.4439829506666664
21 2013 37  .569941 1.1389735984916667
21 2014 37  .738986     1.236348807075
21 2015 37 1.251923  1.812866477783333
21 2016 37 1.200638  3.648327967416666
21 2017 37  .875048  3.638739865583334
22 2004 55 2.963965  .8355138047833333
22 2005 55  2.06641  .6496188354333334
22 2006 55 2.817894  .7327375125666667
22 2007 55 3.534271        .8224521987
22 2008 55 1.347176      1.79040523525
22 2009 55  2.04253 1.2762391918083331
22 2010 55 1.684386  .9888817151916668
22 2011 55 1.012511 1.7063635349166666
22 2012 53 1.085851 2.4439829506666664
22 2013 53 1.478983 1.1389735984916667
22 2014 53 2.123118     1.236348807075
22 2015 53 1.933791  1.812866477783333
end
label values industry industry
label def industry 14 "C15", modify
label def industry 28 "C30", modify
label def industry 37 "C39", modify
label def industry 53 "G55", modify
label def industry 55 "G58", modify
label def industry 63 "K70", modify
label def industry 79 "S90", modify

I estimate the following pairs of regressions:

Code:

xtset id year
tab year, gen(dyear)

// L.x (not OK)
xtreg y L.x i.year i.industry, fe cluster(id)
xi: xtreg y L.x i.year i.industry, fe cluster(id)

You can find their outcomes are different. I believe it is caused by the inclusion of different year dummies. But why is this happening? The following is OK, though.

Code:

// L.x (OK)
xtreg y L.x dyear* i.industry, fe cluster(id)
xi: xtreg y L.x dyear* i.industry, fe cluster(id)

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Tags: None

River Huang

Join Date: Mar 2016

Posts: 1908
#2

02 Mar 2019, 20:28

Also note that it is OK in the following case.

Code:

webuse grunfeld, clear xtset company year xtreg invest L.mvalue i.year, fe cluster(company) xi: xtreg invest L.mvalue i.year, fe cluster(company)

Since the "grunfeld" is balanced in nature, I doubt my question above is related to missing values as well.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30090
#3

02 Mar 2019, 20:40

You answered your own question: the results differ because of differing patterns of which variables get omitted due to colinearity.

The important thing to understand, however, is that the model as a whole is unidentified. The omission of some of the variables is how the model becomes identifiable and estimates made. Regardless of which variables are omitted for this purpose, the model itself, is unchanged, and, importantly, the model's predictions are unchanged. Notice, for example, that all three of the R² statistics for the two models are the same. So are the estimates of sigma_u, sigma_e, and rho. And, if you calculate the predicted outcomes for observations, they too, are identical:

Code:

xtreg y L.x i.year i.industry, fe cluster(id) predict yhat1, xbu xi: xtreg y L.x i.year i.industry, fe cluster(id) predict yhat2, xbu assert yhat1 == yhat2

The point is that things like the coefficients are not identifiable. There is no way to say that one of these sets of results is right and the other is wrong. They are both "wrong" in the sense that they are showing you numbers that are artifacts of how the colinearities were resolved, and the parameters you would like to think they represent are unidentifiable. So none of those are meaningful. But aggregate statistics for the model as a whole,are identifiable, and come out the same either way.

So it really has nothing to do with -xi- per se. It's just that when you use -xi-, Stata selects different variables to omit than when you use factor variable notation. But if you modified your factor variable notation by specifying different base values for the variables, you would see similar changes. It all boils down to the unidentifiablity of the model.
2 likes
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#4

02 Mar 2019, 20:50

Dear Clyde, Thanks, and I got your point.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#5

03 Mar 2019, 00:29

Dear Clyde, On second thought, it seems to me that the omission of some of the variables (dummies) can alter the intercept but should not influence the slope coefficient. Any comments?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

River Huang

Join Date: Mar 2016
Posts: 1908

03 Mar 2019, 00:30

The results are:

Code:

. // L.x (not OK)
. xtreg y L.x i.year, fe cluster(id)
note: 2017.year omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        112
Group variable: id                              Number of groups  =          9

R-sq:                                           Obs per group:
     within  = 0.4256                                         min =         11
     between = 0.0178                                         avg =       12.4
     overall = 0.2237                                         max =         13

                                                F(8,8)            =          .
corr(u_i, Xb)  = 0.0058                         Prob > F          =          .

                                     (Std. Err. adjusted for 9 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
         L1. |   .0616659   .0852046     0.72   0.490    -.1348163     .258148
             |
        year |
       2006  |   .5352382   .1280754     4.18   0.003     .2398957    .8305806
       2007  |   2.045066   .7665683     2.67   0.028     .2773564    3.812776
       2008  |  -.0714743   .1234297    -0.58   0.578    -.3561038    .2131551
       2009  |   .6748936   .1653097     4.08   0.004     .2936888    1.056098
       2010  |   .6788168    .311582     2.18   0.061    -.0396925    1.397326
       2011  |  -.1411296   .1775224    -0.79   0.450    -.5504969    .2682378
       2012  |  -.1744999   .1405368    -1.24   0.250    -.4985783    .1495785
       2013  |  -.1848791   .1607079    -1.15   0.283    -.5554721    .1857139
       2014  |    .176449   .1605802     1.10   0.304    -.1938495    .5467475
       2015  |   .8744082     .39842     2.19   0.059      -.04435    1.793166
       2016  |   .8635172   .5680503     1.52   0.167    -.4464092    2.173444
       2017  |          0  (omitted)
             |
       _cons |   .8822517   .2683177     3.29   0.011       .26351    1.500993
-------------+----------------------------------------------------------------
     sigma_u |  .93546528
     sigma_e |    .758637
         rho |  .60325381   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. xi: xtreg y L.x i.year, fe cluster(id)
i.year            _Iyear_2004-2017    (naturally coded; _Iyear_2004 omitted)
note: _Iyear_2016 omitted because of collinearity
note: _Iyear_2017 omitted because of collinearity

Fixed-effects (within) regression               Number of obs     =        112
Group variable: id                              Number of groups  =          9

R-sq:                                           Obs per group:
     within  = 0.4256                                         min =         11
     between = 0.0178                                         avg =       12.4
     overall = 0.2237                                         max =         13

                                                F(8,8)            =          .
corr(u_i, Xb)  = 0.0058                         Prob > F          =          .

                                     (Std. Err. adjusted for 9 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
         L1. |  -.4087974   .2462526    -1.66   0.135     -.976657    .1590622
             |
 _Iyear_2005 |  -1.323326   .8705276    -1.52   0.167    -3.330766    .6841146
 _Iyear_2006 |  -.8755444   .9662218    -0.91   0.391    -3.103656    1.352567
 _Iyear_2007 |   .6733878   .2789429     2.41   0.042     .0301443    1.316631
 _Iyear_2008 |  -1.400945   .8391488    -1.67   0.134    -3.336026    .5341356
 _Iyear_2009 |  -.1991908   .5065571    -0.39   0.704    -1.367314    .9689319
 _Iyear_2010 |  -.4371638   .6537928    -0.67   0.523    -1.944813    1.070485
 _Iyear_2011 |  -1.392301   .7934816    -1.75   0.117    -3.222073    .4374706
 _Iyear_2012 |  -1.088123   .6329958    -1.72   0.124    -2.547814    .3715681
 _Iyear_2013 |  -.7514792   .4926841    -1.53   0.166    -1.887611    .3846524
 _Iyear_2014 |   -1.00411   .7458202    -1.35   0.215    -2.723975    .7157544
 _Iyear_2015 |  -.2603394   .3940619    -0.66   0.527    -1.169048    .6483689
 _Iyear_2016 |          0  (omitted)
 _Iyear_2017 |          0  (omitted)
       _cons |   2.598656   .9004006     2.89   0.020     .5223285    4.674984
-------------+----------------------------------------------------------------
     sigma_u |  .93546528
     sigma_e |    .758637
         rho |  .60325381   (fraction of variance due to u_i)
------------------------------------------------------------------------------

Ho-Chuan (River) Huang
Stata 19.0, MP(4)

Comment

Eric de Souza

Join Date: Mar 2014
Posts: 587

03 Mar 2019, 07:08

The difference in results is coming from your data. Because of missing values, you are not estimating with the same data set in the two cases. I simplified the estimation commands to remove inessentials and obtained the same resulats as you did.

Code:

. tab year, gen(yr)

       year |      Freq.     Percent        Cum.
------------+-----------------------------------
       2004 |         18        7.20        7.20
       2005 |         18        7.20       14.40
       2006 |         18        7.20       21.60
       2007 |         18        7.20       28.80
       2008 |         18        7.20       36.00
       2009 |         18        7.20       43.20
       2010 |         18        7.20       50.40
       2011 |         18        7.20       57.60
       2012 |         18        7.20       64.80
       2013 |         18        7.20       72.00
       2014 |         18        7.20       79.20
       2015 |         18        7.20       86.40
       2016 |         17        6.80       93.20
       2017 |         17        6.80      100.00
------------+-----------------------------------
      Total |        250      100.00

. xtreg y L.x yr2-yr12, fe

Fixed-effects (within) regression               Number of obs     =        112
Group variable: id                              Number of groups  =          9

R-sq:                                           Obs per group:
     within  = 0.4256                                         min =         11
     between = 0.0178                                         avg =       12.4
     overall = 0.2237                                         max =         13

                                                F(12,91)          =       5.62
corr(u_i, Xb)  = 0.0058                         Prob > F          =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
         L1. |  -.4087974   .2245499    -1.82   0.072    -.8548381    .0372433
             |
         yr2 |  -1.323326   .5112663    -2.59   0.011    -2.338893    -.307758
         yr3 |  -.8755444   .5440859    -1.61   0.111    -1.956304    .2052155
         yr4 |   .6733878   .5372659     1.25   0.213    -.3938249    1.740601
         yr5 |  -1.400945     .51353    -2.73   0.008    -2.421009   -.3808808
         yr6 |  -.1991908   .3720255    -0.54   0.594    -.9381737    .5397921
         yr7 |  -.4371638   .4395471    -0.99   0.323     -1.31027    .4359424
         yr8 |  -1.392301   .4852227    -2.87   0.005    -2.356137   -.4284661
         yr9 |  -1.088123   .3814934    -2.85   0.005    -1.845913    -.330333
        yr10 |  -.7514792   .3274588    -2.29   0.024    -1.401936   -.1010226
        yr11 |   -1.00411   .4608035    -2.18   0.032     -1.91944   -.0887806
        yr12 |  -.2603394   .4456091    -0.58   0.561    -1.145487    .6248083
       _cons |   2.598656   .6168557     4.21   0.000     1.373348    3.823964
-------------+----------------------------------------------------------------
     sigma_u |  .93546528
     sigma_e |    .758637
         rho |  .60325381   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(8, 91) = 18.37                      Prob > F = 0.0000

. xtreg y L.x yr3-yr13, fe

Fixed-effects (within) regression               Number of obs     =        112
Group variable: id                              Number of groups  =          9

R-sq:                                           Obs per group:
     within  = 0.4256                                         min =         11
     between = 0.0178                                         avg =       12.4
     overall = 0.2237                                         max =         13

                                                F(12,91)          =       5.62
corr(u_i, Xb)  = 0.0058                         Prob > F          =     0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
         L1. |   .0616659   .1433616     0.43   0.668    -.2231044    .3464361
             |
         yr3 |   .5352382   .3702138     1.45   0.152    -.2001461    1.270622
         yr4 |   2.045066   .3760629     5.44   0.000     1.298063    2.792069
         yr5 |  -.0714743   .3584592    -0.20   0.842    -.7835094    .6405608
         yr6 |   .6748936   .3212754     2.10   0.038     .0367194    1.313068
         yr7 |   .6788168   .3344374     2.03   0.045      .014498    1.343136
         yr8 |  -.1411296    .348433    -0.41   0.686    -.8332489    .5509897
         yr9 |  -.1744999   .3223129    -0.54   0.590     -.814735    .4657351
        yr10 |  -.1848791    .328532    -0.56   0.575    -.8374675    .4677094
        yr11 |    .176449   .3405734     0.52   0.606    -.5000582    .8529562
        yr12 |   .8744082   .3361134     2.60   0.011     .2067602    1.542056
        yr13 |   .8635172   .3336195     2.59   0.011     .2008231    1.526211
       _cons |   .8822517   .3411809     2.59   0.011     .2045377    1.559966
-------------+----------------------------------------------------------------
     sigma_u |  .93546528
     sigma_e |    .758637
         rho |  .60325381   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(8, 91) = 18.37                      Prob > F = 0.0000

Comment

Eric de Souza

Join Date: Mar 2014

Posts: 587
#8

03 Mar 2019, 07:23

On edit: If I do the same thing with the grunfeld data set, I get identical results for the coefficients on mvalue and kstock with invest as dependent variable.
Comment

Andrew Musau

Join Date: Oct 2014
Posts: 10190

03 Mar 2019, 10:06

Maybe the following example helps:

Code:

input float(y x dummy)
1 1 0
2 1 0
3 . 1
4 . 1
end

regress y x dummy

Here, the values of x are missing when the dummy is positive. Because of listwise deletion of missing values, you effectively have a zero variable for the dummy variable. With a constant in the model, the x variable will additionally be omitted because of collinearity.

Code:

. regress y x dummy
note: x omitted because of collinearity
note: dummy omitted because of collinearity

      Source |       SS           df       MS      Number of obs   =         2
-------------+----------------------------------   F(0, 1)         =      0.00
       Model |           0         0           .   Prob > F        =         .
    Residual |          .5         1          .5   R-squared       =    0.0000
-------------+----------------------------------   Adj R-squared   =    0.0000
       Total |          .5         1          .5   Root MSE        =    .70711

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |          0  (omitted)
       dummy |          0  (omitted)
       _cons |        1.5         .5     3.00   0.205    -4.853102    7.853102
------------------------------------------------------------------------------

You can verify that you have something similar to the above

Code:

. tab year, gen(yr)

       year |      Freq.     Percent        Cum.
------------+-----------------------------------
       2004 |         18        7.20        7.20
       2005 |         18        7.20       14.40
       2006 |         18        7.20       21.60
       2007 |         18        7.20       28.80
       2008 |         18        7.20       36.00
       2009 |         18        7.20       43.20
       2010 |         18        7.20       50.40
       2011 |         18        7.20       57.60
       2012 |         18        7.20       64.80
       2013 |         18        7.20       72.00
       2014 |         18        7.20       79.20
       2015 |         18        7.20       86.40
       2016 |         17        6.80       93.20
       2017 |         17        6.80      100.00
------------+-----------------------------------
      Total |        250      100.00

. gen lx=L.x
(135 missing values generated)

. list yr1 lx if yr1==1

     +----------+
     | yr1   lx |
     |----------|
  1. |   1    . |
 15. |   1    . |
 29. |   1    . |
 43. |   1    . |
 57. |   1    . |
     |----------|
 71. |   1    . |
 85. |   1    . |
 99. |   1    . |
113. |   1    . |
127. |   1    . |
     |----------|
141. |   1    . |
155. |   1    . |
169. |   1    . |
183. |   1    . |
197. |   1    . |
     |----------|
211. |   1    . |
225. |   1    . |
239. |   1    . |
     +----------+

The default for factor variables in Stata is omit the minimum category. Stata thus recognizes that x1 in your data is 0 and resorts to designate 2005 (the second year) as the base, but still runs into collinearity issues in estimating the other dummies.

Code:

qui xtreg y L.x i.year, fe cluster(id)
mat list e(b)

. mat list e(b)

e(b)[1,15]
             L.      2005b.       2006.       2007.       2008.       2009.       2010.       2011.       2012.       2013.
             x        year        year        year        year        year        year        year        year        year
y1   .06166586           0   .53523815    2.045066  -.07147432   .67489357   .67881684  -.14112956  -.17449992  -.18487908

          2014.       2015.       2016.      2017o.            
          year        year        year        year       _cons
y1     .176449   .87440816   .86351722           0   .88225171

On the other hand, by choosing a different base in your data and omitting x1, it is possible not to run into collinearity issues and that is what happens when using the xi prefex or generating dummies by hand. Bottom line is that you need \(T-1\) dummies for years, and with one dummy effectively equal to 0, there is a problem with your data (or defined sample). reghdfe (SSC, by Sergio Correia) which does within estimation (demeaning) would have immediately signaled this to you.

Code:

. reghdfe y L.x, absorb(id year) cluster(id)
(converged in 6 iterations)
note: L.x omitted because of collinearity

HDFE Linear regression                            Number of obs   =        112
Absorbing 2 HDFE groups                           F(   0,      8) =       0.00
Statistics robust to heteroskedasticity           Prob > F        =          .
                                                  R-squared       =     0.7048
                                                  Adj R-squared   =     0.6399
                                                  Within R-sq.    =     0.0000
Number of clusters (id)      =          9         Root MSE        =     0.7586

                                     (Std. Err. adjusted for 9 clusters in id)
------------------------------------------------------------------------------
             |               Robust
           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
           x |
         L1. |          0  (omitted)
------------------------------------------------------------------------------

Absorbed degrees of freedom:
---------------------------------------------------------------+
 Absorbed FE |  Num. Coefs.  =   Categories  -   Redundant     | 
-------------+-------------------------------------------------|
          id |            0               9              9 *   | 
        year |           12              13              1     | 
---------------------------------------------------------------+
* = fixed effect nested within cluster; treated as redundant for DoF computation

Comment

River Huang

Join Date: Mar 2016

Posts: 1908
#10

03 Mar 2019, 16:10

Dear Eric, Thanks for the examples. I think they are quite similar to those I mentioned above. Say,

Code:

// L.x (OK) xtreg y L.x dyear* i.industry, fe cluster(id) xi: xtreg y L.x dyear* i.industry, fe cluster(id)

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#11

03 Mar 2019, 16:13

Dear Andrew, Many thanks for this interesting example.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Eric de Souza

Join Date: Mar 2014

Posts: 587
#12

04 Mar 2019, 01:35

Dear River Huang, in reply to your post #10, my point was that it has nothing to do with the use of -xi- or -i.-. Your use of -xi- in one case and of -i.- in the other gave that impression.
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#13

04 Mar 2019, 03:35

Dear Eric, Thanks. I think that I agree with your explanation that, due to different samples used in the estimation, the results with "xi" are not identical to those without "xi". However, I did not see how to delete "something" to make them identical. Any suggestions?

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Andrew Musau

Join Date: Oct 2014

Posts: 10190
#14

04 Mar 2019, 06:27

River Huang - I do not see what the point is because the coefficient of lagged x is not identified because it is collinear with the time dummies. As I show in #9, use of year dummies masks this collinearity, but by using reghdfe, you immediately see that you cannot obtain an estimate. To replicate the regressions using year dummies, just specify the base level in the factor variable regression.

Code:

xtreg y L.x ib2016.year, fe cluster(id) xi: xtreg y L.x i.year, fe cluster(id)
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#15

04 Mar 2019, 15:57

Dear Andrew, Thank you so much for these explanations. It really helps.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment

Announcement

To "xi" or not to "xi"?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment