singleton drops in panel data, which level FE?

Stephen Ch

Join Date: Apr 2022

Posts: 67
#1

singleton drops in panel data, which level FE?

18 May 2023, 12:58

Hi all,

Consider the following panel regression codes: (id: hospital ID, county: the county inwhich hospital resides, $y: global regressand, i.treatment = indicator for difference-in-dinfference)

Code:

xtreg $y i.treatment, fe vce(cluster id) xtreg $y i.treatment, fe vce(cluster county) reghdfe $y i.treatment, absorbe(id county) vce(cluster id)

As you can imagine, there are a significant number of singleton observations that is dropped in (1).
In (2), because I examine the county-level fixed effects, much fewer singleton observations are dropped.
In (3), since the lowest entity level for FE is hospital id, it drops just as many singletons dropped in (1).

In theory, if there are time-invariant characteristics that are important to be absorbed by the within estimator, I should always resort to (2) or (3) despite it drops a lot of singleton observations, correct?
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17739
#2

19 May 2023, 00:35

Stephen:
you shoud better clustering at -panelid- level.
In addition, -county- is in all likelihood a time-invariant regressor, hence it will be wiped out by the -fe- estimator.
For nested study design, I'd consider -mixed- (that is a cousin of -xtreg,re mle-, though).

Kind regards,
Carlo
(Stata 19.0)
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#3

22 May 2023, 06:57

Carlo,
Thanks for the reply!
Yes, this sounds good.

Also, in the -xtreg, fe- command, do you know how many panel identification observations are used? Which information conveys this information do you know (not the total # of observations)?

Originally posted by Carlo Lazzaro View Post

Stephen:
you shoud better clustering at -panelid- level.
In addition, -county- is in all likelihood a time-invariant regressor, hence it will be wiped out by the -fe- estimator.
For nested study design, I'd consider -mixed- (that is a cousin of -xtreg,re mle-, though).
Comment

Carlo Lazzaro

Join Date: Apr 2014
Posts: 17739

22 May 2023, 07:33

Stephen:
you can get the bug picture this way:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age, fe

Fixed-effects (within) regression               Number of obs     =     28,510
Group variable: idcode                          Number of groups  =      4,710

R-squared:                                      Obs per group:
     Within  = 0.1026                                         min =          1
     Between = 0.0877                                         avg =        6.1
     Overall = 0.0774                                         max =         15

                                                F(1,23799)        =    2720.20
corr(u_i, Xb) = 0.0314                          Prob > F          =     0.0000

------------------------------------------------------------------------------
     ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
-------------+----------------------------------------------------------------
         age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
       _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
-------------+----------------------------------------------------------------
     sigma_u |  .40635023
     sigma_e |  .30349389
         rho |  .64192015   (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000

. di 4710*15
70650

. di 4710*6.1
28731

. xtsum

Variable         |      Mean   Std. dev.       Min        Max |    Observations
-----------------+--------------------------------------------+----------------
idcode   overall |  2601.284   1487.359          1       5159 |     N =   28534
         between |              1487.57          1       5159 |     n =    4711
         within  |                    0   2601.284   2601.284 | T-bar = 6.05689
                 |                                            |
year     overall |  77.95865   6.383879         68         88 |     N =   28534
         between |             5.156521         68         88 |     n =    4711
         within  |             5.138271   63.79198   92.70865 | T-bar = 6.05689
                 |                                            |
birth_yr overall |  48.08509   3.012837         41         54 |     N =   28534
         between |             3.051795         41         54 |     n =    4711
         within  |                    0   48.08509   48.08509 | T-bar = 6.05689
                 |                                            |
age      overall |  29.04511   6.700584         14         46 |     N =   28510
         between |             5.485756         14         45 |     n =    4710
         within  |              5.16945   14.79511   43.79511 | T-bar = 6.05308
                 |                                            |
race     overall |  1.303392   .4822773          1          3 |     N =   28534
         between |             .4862111          1          3 |     n =    4711
         within  |                    0   1.303392   1.303392 | T-bar = 6.05689
                 |                                            |
msp      overall |  .6029175   .4893019          0          1 |     N =   28518
         between |             .3982385          0          1 |     n =    4711
         within  |             .3238927  -.3304159   1.536251 | T-bar = 6.05349
                 |                                            |
nev_mar  overall |  .2296795   .4206341          0          1 |     N =   28518
         between |             .3684416          0          1 |     n =    4711
         within  |             .2456558  -.7036538   1.163013 | T-bar = 6.05349
                 |                                            |
grade    overall |  12.53259   2.323905          0         18 |     N =   28532
         between |             2.566536          0         18 |     n =    4709
         within  |                    0   12.53259   12.53259 | T-bar = 6.05904
                 |                                            |
collgrad overall |  .1680451   .3739129          0          1 |     N =   28534
         between |             .4045558          0          1 |     n =    4711
         within  |                    0   .1680451   .1680451 | T-bar = 6.05689
                 |                                            |
not_smsa overall |  .2824441   .4501961          0          1 |     N =   28526
         between |             .4111053          0          1 |     n =    4711
         within  |             .1834446  -.6461273   1.215777 | T-bar = 6.05519
                 |                                            |
c_city   overall |   .357218   .4791882          0          1 |     N =   28526
         between |             .4271586          0          1 |     n =    4711
         within  |             .2490022  -.5761154   1.290551 | T-bar = 6.05519
                 |                                            |
south    overall |  .4095562   .4917605          0          1 |     N =   28526
         between |             .4667982          0          1 |     n =    4711
         within  |             .1597932  -.5237771    1.34289 | T-bar = 6.05519
                 |                                            |
ind_code overall |  7.692973   2.994025          1         12 |     N =   28193
         between |             2.542844          1         12 |     n =    4695
         within  |             1.708429  -1.507027   17.12154 | T-bar =  6.0049
                 |                                            |
occ_code overall |  4.777672   3.065435          1         13 |     N =   28413
         between |              2.86512          1         13 |     n =    4699
         within  |             1.650248  -5.522328   15.44434 | T-bar = 6.04661
                 |                                            |
union    overall |  .2344319   .4236542          0          1 |     N =   19238
         between |             .3341803          0          1 |     n =    4150
         within  |             .2668622  -.6822348   1.151099 | T-bar = 4.63566
                 |                                            |
wks_ue   overall |  2.548095   7.294463          0         76 |     N =   22830
         between |             5.181437          0         76 |     n =    4645
         within  |                6.054  -33.95191   64.38143 | T-bar = 4.91496
                 |                                            |
ttl_exp  overall |  6.215316   4.652117          0   28.88461 |     N =   28534
         between |             3.724221          0    24.7062 |     n =    4711
         within  |             3.484133  -9.642671   20.38091 | T-bar = 6.05689
                 |                                            |
tenure   overall |  3.123836   3.751409          0   25.91667 |     N =   28101
         between |             2.796519          0   21.16667 |     n =    4699
         within  |             2.659784  -14.27894   15.62384 | T-bar = 5.98021
                 |                                            |
hours    overall |  36.55956   9.869623          1        168 |     N =   28467
         between |             7.846585          1       83.5 |     n =    4710
         within  |             7.520712  -2.154726   130.0596 | T-bar = 6.04395
                 |                                            |
wks_work overall |  53.98933   29.03232          0        104 |     N =   27831
         between |             20.64508          0        104 |     n =    4686
         within  |             23.96999  -18.43924    131.156 | T-bar = 5.93918
                 |                                            |
ln_wage  overall |  1.674907   .4780935          0   5.263916 |     N =   28534
         between |              .424569          0   3.912023 |     n =    4711
         within  |               .29266  -.4077221    4.78367 | T-bar = 6.05689

. xtdes

  idcode:  1, 2, ..., 5159                                   n =       4711
    year:  68, 69, ..., 88                                   T =         15
           Delta(year) = 1 unit
           Span(year)  = 21 periods
           (idcode*year uniquely identifies each observation)

Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                         1       1       3         5         9      13      15

     Freq.  Percent    Cum. |  Pattern
 ---------------------------+-----------------------
      136      2.89    2.89 |  1....................
      114      2.42    5.31 |  ....................1
       89      1.89    7.20 |  .................1.11
       87      1.85    9.04 |  ...................11
       86      1.83   10.87 |  111111.1.11.1.11.1.11
       61      1.29   12.16 |  ..............11.1.11
       56      1.19   13.35 |  11...................
       54      1.15   14.50 |  ...............1.1.11
       54      1.15   15.64 |  .......1.11.1.11.1.11
     3974     84.36  100.00 | (other patterns)
 ---------------------------+-----------------------
     4711    100.00         |  XXXXXX.X.XX.X.XX.X.XX

.

Kind regards,
Carlo
(Stata 19.0)

Comment

Stephen Ch

Join Date: Apr 2022
Posts: 67

22 May 2023, 08:25

Thanks for the reply.

Also, notice "Number of Groups" gives the rough number as well.

Originally posted by Carlo Lazzaro View Post

Stephen:
you can get the bug picture this way:

Code:

. use "https://www.stata-press.com/data/r17/nlswork.dta"
(National Longitudinal Survey of Young Women, 14-24 years old in 1968)

. xtreg ln_wage c.age, fe

Fixed-effects (within) regression Number of obs = 28,510
Group variable: idcode Number of groups = 4,710

R-squared: Obs per group:
Within = 0.1026 min = 1
Between = 0.0877 avg = 6.1
Overall = 0.0774 max = 15

F(1,23799) = 2720.20
corr(u_i, Xb) = 0.0314 Prob > F = 0.0000

------------------------------------------------------------------------------
ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
-------------+----------------------------------------------------------------
age | .0181349 .0003477 52.16 0.000 .0174534 .0188164
_cons | 1.148214 .0102579 111.93 0.000 1.128107 1.16832
-------------+----------------------------------------------------------------
sigma_u | .40635023
sigma_e | .30349389
rho | .64192015 (fraction of variance due to u_i)
------------------------------------------------------------------------------
F test that all u_i=0: F(4709, 23799) = 8.81 Prob > F = 0.0000

. di 4710*15
70650

. di 4710*6.1
28731

. xtsum

Variable | Mean Std. dev. Min Max | Observations
-----------------+--------------------------------------------+----------------
idcode overall | 2601.284 1487.359 1 5159 | N = 28534
between | 1487.57 1 5159 | n = 4711
within | 0 2601.284 2601.284 | T-bar = 6.05689
| |
year overall | 77.95865 6.383879 68 88 | N = 28534
between | 5.156521 68 88 | n = 4711
within | 5.138271 63.79198 92.70865 | T-bar = 6.05689
| |
birth_yr overall | 48.08509 3.012837 41 54 | N = 28534
between | 3.051795 41 54 | n = 4711
within | 0 48.08509 48.08509 | T-bar = 6.05689
| |
age overall | 29.04511 6.700584 14 46 | N = 28510
between | 5.485756 14 45 | n = 4710
within | 5.16945 14.79511 43.79511 | T-bar = 6.05308
| |
race overall | 1.303392 .4822773 1 3 | N = 28534
between | .4862111 1 3 | n = 4711
within | 0 1.303392 1.303392 | T-bar = 6.05689
| |
msp overall | .6029175 .4893019 0 1 | N = 28518
between | .3982385 0 1 | n = 4711
within | .3238927 -.3304159 1.536251 | T-bar = 6.05349
| |
nev_mar overall | .2296795 .4206341 0 1 | N = 28518
between | .3684416 0 1 | n = 4711
within | .2456558 -.7036538 1.163013 | T-bar = 6.05349
| |
grade overall | 12.53259 2.323905 0 18 | N = 28532
between | 2.566536 0 18 | n = 4709
within | 0 12.53259 12.53259 | T-bar = 6.05904
| |
collgrad overall | .1680451 .3739129 0 1 | N = 28534
between | .4045558 0 1 | n = 4711
within | 0 .1680451 .1680451 | T-bar = 6.05689
| |
not_smsa overall | .2824441 .4501961 0 1 | N = 28526
between | .4111053 0 1 | n = 4711
within | .1834446 -.6461273 1.215777 | T-bar = 6.05519
| |
c_city overall | .357218 .4791882 0 1 | N = 28526
between | .4271586 0 1 | n = 4711
within | .2490022 -.5761154 1.290551 | T-bar = 6.05519
| |
south overall | .4095562 .4917605 0 1 | N = 28526
between | .4667982 0 1 | n = 4711
within | .1597932 -.5237771 1.34289 | T-bar = 6.05519
| |
ind_code overall | 7.692973 2.994025 1 12 | N = 28193
between | 2.542844 1 12 | n = 4695
within | 1.708429 -1.507027 17.12154 | T-bar = 6.0049
| |
occ_code overall | 4.777672 3.065435 1 13 | N = 28413
between | 2.86512 1 13 | n = 4699
within | 1.650248 -5.522328 15.44434 | T-bar = 6.04661
| |
union overall | .2344319 .4236542 0 1 | N = 19238
between | .3341803 0 1 | n = 4150
within | .2668622 -.6822348 1.151099 | T-bar = 4.63566
| |
wks_ue overall | 2.548095 7.294463 0 76 | N = 22830
between | 5.181437 0 76 | n = 4645
within | 6.054 -33.95191 64.38143 | T-bar = 4.91496
| |
ttl_exp overall | 6.215316 4.652117 0 28.88461 | N = 28534
between | 3.724221 0 24.7062 | n = 4711
within | 3.484133 -9.642671 20.38091 | T-bar = 6.05689
| |
tenure overall | 3.123836 3.751409 0 25.91667 | N = 28101
between | 2.796519 0 21.16667 | n = 4699
within | 2.659784 -14.27894 15.62384 | T-bar = 5.98021
| |
hours overall | 36.55956 9.869623 1 168 | N = 28467
between | 7.846585 1 83.5 | n = 4710
within | 7.520712 -2.154726 130.0596 | T-bar = 6.04395
| |
wks_work overall | 53.98933 29.03232 0 104 | N = 27831
between | 20.64508 0 104 | n = 4686
within | 23.96999 -18.43924 131.156 | T-bar = 5.93918
| |
ln_wage overall | 1.674907 .4780935 0 5.263916 | N = 28534
between | .424569 0 3.912023 | n = 4711
within | .29266 -.4077221 4.78367 | T-bar = 6.05689

. xtdes

idcode: 1, 2, ..., 5159 n = 4711
year: 68, 69, ..., 88 T = 15
Delta(year) = 1 unit
Span(year) = 21 periods
(idcode*year uniquely identifies each observation)

Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 3 5 9 13 15

Freq. Percent Cum. | Pattern
---------------------------+-----------------------
136 2.89 2.89 | 1....................
114 2.42 5.31 | ....................1
89 1.89 7.20 | .................1.11
87 1.85 9.04 | ...................11
86 1.83 10.87 | 111111.1.11.1.11.1.11
61 1.29 12.16 | ..............11.1.11
56 1.19 13.35 | 11...................
54 1.15 14.50 | ...............1.1.11
54 1.15 15.64 | .......1.11.1.11.1.11
3974 84.36 100.00 | (other patterns)
---------------------------+-----------------------
4711 100.00 | XXXXXX.X.XX.X.XX.X.XX

.

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 2204
#6

22 May 2023, 08:59

Stephen: A few issues here. First, assuming you did xtset id (or xtset id year), the first two commands result in the usual fixed effects estimator at the hospital level. So they're using the same observations, which means all singleton hospitals are dropped. The same is true for the reghdfe command because you are absorbing by id. As Carlo points out, once you include hospital fixed effects the county FEs are redundant. Have you tried the commands on your data? You should find the point estimates are the same; the standard errors differ by level of clustering.

I'm surprised you're not including time effects in any of the estimation. That can seriously bias the estimated treatment effect if you truly have a panel data set. I assume you do because otherwise use FE wouldn't make sense.

It's not clear that you should only cluster at the id level. If the policy is entirely or largely determined at the county level then you should cluster at the county level. The recent Abadie, Athey, Imbens, Wooldridge (2023, Quarterly Journal of Economics) paper covers a simple setting (not panel data) but the insights should carry over.

If most of the variation in the policy is at the county level, and not the hospital, I would use

Code:

xtset id year xtreg y i.treat i.year, fe vce(cluster county)

You might cluster at the id level if the standard errors above are "too large" to be useful. But you have to explain why.
1 like
Comment
Stephen Ch

Join Date: Apr 2022

Posts: 67
#7

22 May 2023, 11:47

Jeff,

Thanks for the reply.

This all makes sense. I have the time fixed effects included, and I have a follow-up question on a different post.

I will be sure to read your co-authored paper on QJE.

Originally posted by Jeff Wooldridge View Post

Stephen: A few issues here. First, assuming you did xtset id (or xtset id year), the first two commands result in the usual fixed effects estimator at the hospital level. So they're using the same observations, which means all singleton hospitals are dropped. The same is true for the reghdfe command because you are absorbing by id. As Carlo points out, once you include hospital fixed effects the county FEs are redundant. Have you tried the commands on your data? You should find the point estimates are the same; the standard errors differ by level of clustering.

I'm surprised you're not including time effects in any of the estimation. That can seriously bias the estimated treatment effect if you truly have a panel data set. I assume you do because otherwise use FE wouldn't make sense.

It's not clear that you should only cluster at the id level. If the policy is entirely or largely determined at the county level then you should cluster at the county level. The recent Abadie, Athey, Imbens, Wooldridge (2023, Quarterly Journal of Economics) paper covers a simple setting (not panel data) but the insights should carry over.

If most of the variation in the policy is at the county level, and not the hospital, I would use

Code:

xtset id year xtreg y i.treat i.year, fe vce(cluster county)

You might cluster at the id level if the standard errors above are "too large" to be useful. But you have to explain why.
Comment

Announcement

singleton drops in panel data, which level FE?

Comment

Comment

Comment

Comment

Comment

Comment