Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • singleton drops in panel data, which level FE?

    Hi all,

    Consider the following panel regression codes: (id: hospital ID, county: the county inwhich hospital resides, $y: global regressand, i.treatment = indicator for difference-in-dinfference)

    Code:
    xtreg $y i.treatment, fe vce(cluster id)
    xtreg $y i.treatment, fe vce(cluster county)
    reghdfe $y i.treatment, absorbe(id county) vce(cluster id)
    As you can imagine, there are a significant number of singleton observations that is dropped in (1).
    In (2), because I examine the county-level fixed effects, much fewer singleton observations are dropped.
    In (3), since the lowest entity level for FE is hospital id, it drops just as many singletons dropped in (1).

    In theory, if there are time-invariant characteristics that are important to be absorbed by the within estimator, I should always resort to (2) or (3) despite it drops a lot of singleton observations, correct?

  • #2
    Stephen:
    you shoud better clustering at -panelid- level.
    In addition, -county- is in all likelihood a time-invariant regressor, hence it will be wiped out by the -fe- estimator.
    For nested study design, I'd consider -mixed- (that is a cousin of -xtreg,re mle-, though).
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Carlo,
      Thanks for the reply!
      Yes, this sounds good.

      Also, in the -xtreg, fe- command, do you know how many panel identification observations are used? Which information conveys this information do you know (not the total # of observations)?

      Originally posted by Carlo Lazzaro View Post
      Stephen:
      you shoud better clustering at -panelid- level.
      In addition, -county- is in all likelihood a time-invariant regressor, hence it will be wiped out by the -fe- estimator.
      For nested study design, I'd consider -mixed- (that is a cousin of -xtreg,re mle-, though).

      Comment


      • #4
        Stephen:
        you can get the bug picture this way:
        Code:
        . use "https://www.stata-press.com/data/r17/nlswork.dta"
        (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
        
        . xtreg ln_wage c.age, fe
        
        Fixed-effects (within) regression               Number of obs     =     28,510
        Group variable: idcode                          Number of groups  =      4,710
        
        R-squared:                                      Obs per group:
             Within  = 0.1026                                         min =          1
             Between = 0.0877                                         avg =        6.1
             Overall = 0.0774                                         max =         15
        
                                                        F(1,23799)        =    2720.20
        corr(u_i, Xb) = 0.0314                          Prob > F          =     0.0000
        
        ------------------------------------------------------------------------------
             ln_wage | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
        -------------+----------------------------------------------------------------
                 age |   .0181349   .0003477    52.16   0.000     .0174534    .0188164
               _cons |   1.148214   .0102579   111.93   0.000     1.128107     1.16832
        -------------+----------------------------------------------------------------
             sigma_u |  .40635023
             sigma_e |  .30349389
                 rho |  .64192015   (fraction of variance due to u_i)
        ------------------------------------------------------------------------------
        F test that all u_i=0: F(4709, 23799) = 8.81                 Prob > F = 0.0000
        
        . di 4710*15
        70650
        
        . di 4710*6.1
        28731
        
        . xtsum
        
        Variable         |      Mean   Std. dev.       Min        Max |    Observations
        -----------------+--------------------------------------------+----------------
        idcode   overall |  2601.284   1487.359          1       5159 |     N =   28534
                 between |              1487.57          1       5159 |     n =    4711
                 within  |                    0   2601.284   2601.284 | T-bar = 6.05689
                         |                                            |
        year     overall |  77.95865   6.383879         68         88 |     N =   28534
                 between |             5.156521         68         88 |     n =    4711
                 within  |             5.138271   63.79198   92.70865 | T-bar = 6.05689
                         |                                            |
        birth_yr overall |  48.08509   3.012837         41         54 |     N =   28534
                 between |             3.051795         41         54 |     n =    4711
                 within  |                    0   48.08509   48.08509 | T-bar = 6.05689
                         |                                            |
        age      overall |  29.04511   6.700584         14         46 |     N =   28510
                 between |             5.485756         14         45 |     n =    4710
                 within  |              5.16945   14.79511   43.79511 | T-bar = 6.05308
                         |                                            |
        race     overall |  1.303392   .4822773          1          3 |     N =   28534
                 between |             .4862111          1          3 |     n =    4711
                 within  |                    0   1.303392   1.303392 | T-bar = 6.05689
                         |                                            |
        msp      overall |  .6029175   .4893019          0          1 |     N =   28518
                 between |             .3982385          0          1 |     n =    4711
                 within  |             .3238927  -.3304159   1.536251 | T-bar = 6.05349
                         |                                            |
        nev_mar  overall |  .2296795   .4206341          0          1 |     N =   28518
                 between |             .3684416          0          1 |     n =    4711
                 within  |             .2456558  -.7036538   1.163013 | T-bar = 6.05349
                         |                                            |
        grade    overall |  12.53259   2.323905          0         18 |     N =   28532
                 between |             2.566536          0         18 |     n =    4709
                 within  |                    0   12.53259   12.53259 | T-bar = 6.05904
                         |                                            |
        collgrad overall |  .1680451   .3739129          0          1 |     N =   28534
                 between |             .4045558          0          1 |     n =    4711
                 within  |                    0   .1680451   .1680451 | T-bar = 6.05689
                         |                                            |
        not_smsa overall |  .2824441   .4501961          0          1 |     N =   28526
                 between |             .4111053          0          1 |     n =    4711
                 within  |             .1834446  -.6461273   1.215777 | T-bar = 6.05519
                         |                                            |
        c_city   overall |   .357218   .4791882          0          1 |     N =   28526
                 between |             .4271586          0          1 |     n =    4711
                 within  |             .2490022  -.5761154   1.290551 | T-bar = 6.05519
                         |                                            |
        south    overall |  .4095562   .4917605          0          1 |     N =   28526
                 between |             .4667982          0          1 |     n =    4711
                 within  |             .1597932  -.5237771    1.34289 | T-bar = 6.05519
                         |                                            |
        ind_code overall |  7.692973   2.994025          1         12 |     N =   28193
                 between |             2.542844          1         12 |     n =    4695
                 within  |             1.708429  -1.507027   17.12154 | T-bar =  6.0049
                         |                                            |
        occ_code overall |  4.777672   3.065435          1         13 |     N =   28413
                 between |              2.86512          1         13 |     n =    4699
                 within  |             1.650248  -5.522328   15.44434 | T-bar = 6.04661
                         |                                            |
        union    overall |  .2344319   .4236542          0          1 |     N =   19238
                 between |             .3341803          0          1 |     n =    4150
                 within  |             .2668622  -.6822348   1.151099 | T-bar = 4.63566
                         |                                            |
        wks_ue   overall |  2.548095   7.294463          0         76 |     N =   22830
                 between |             5.181437          0         76 |     n =    4645
                 within  |                6.054  -33.95191   64.38143 | T-bar = 4.91496
                         |                                            |
        ttl_exp  overall |  6.215316   4.652117          0   28.88461 |     N =   28534
                 between |             3.724221          0    24.7062 |     n =    4711
                 within  |             3.484133  -9.642671   20.38091 | T-bar = 6.05689
                         |                                            |
        tenure   overall |  3.123836   3.751409          0   25.91667 |     N =   28101
                 between |             2.796519          0   21.16667 |     n =    4699
                 within  |             2.659784  -14.27894   15.62384 | T-bar = 5.98021
                         |                                            |
        hours    overall |  36.55956   9.869623          1        168 |     N =   28467
                 between |             7.846585          1       83.5 |     n =    4710
                 within  |             7.520712  -2.154726   130.0596 | T-bar = 6.04395
                         |                                            |
        wks_work overall |  53.98933   29.03232          0        104 |     N =   27831
                 between |             20.64508          0        104 |     n =    4686
                 within  |             23.96999  -18.43924    131.156 | T-bar = 5.93918
                         |                                            |
        ln_wage  overall |  1.674907   .4780935          0   5.263916 |     N =   28534
                 between |              .424569          0   3.912023 |     n =    4711
                 within  |               .29266  -.4077221    4.78367 | T-bar = 6.05689
        
        . xtdes
        
          idcode:  1, 2, ..., 5159                                   n =       4711
            year:  68, 69, ..., 88                                   T =         15
                   Delta(year) = 1 unit
                   Span(year)  = 21 periods
                   (idcode*year uniquely identifies each observation)
        
        Distribution of T_i:   min      5%     25%       50%       75%     95%     max
                                 1       1       3         5         9      13      15
        
             Freq.  Percent    Cum. |  Pattern
         ---------------------------+-----------------------
              136      2.89    2.89 |  1....................
              114      2.42    5.31 |  ....................1
               89      1.89    7.20 |  .................1.11
               87      1.85    9.04 |  ...................11
               86      1.83   10.87 |  111111.1.11.1.11.1.11
               61      1.29   12.16 |  ..............11.1.11
               56      1.19   13.35 |  11...................
               54      1.15   14.50 |  ...............1.1.11
               54      1.15   15.64 |  .......1.11.1.11.1.11
             3974     84.36  100.00 | (other patterns)
         ---------------------------+-----------------------
             4711    100.00         |  XXXXXX.X.XX.X.XX.X.XX
        
        .
        Kind regards,
        Carlo
        (Stata 19.0)

        Comment


        • #5
          Thanks for the reply.

          Also, notice "Number of Groups" gives the rough number as well.

          Originally posted by Carlo Lazzaro View Post
          Stephen:
          you can get the bug picture this way:
          Code:
          . use "https://www.stata-press.com/data/r17/nlswork.dta"
          (National Longitudinal Survey of Young Women, 14-24 years old in 1968)
          
          . xtreg ln_wage c.age, fe
          
          Fixed-effects (within) regression Number of obs = 28,510
          Group variable: idcode Number of groups = 4,710
          
          R-squared: Obs per group:
          Within = 0.1026 min = 1
          Between = 0.0877 avg = 6.1
          Overall = 0.0774 max = 15
          
          F(1,23799) = 2720.20
          corr(u_i, Xb) = 0.0314 Prob > F = 0.0000
          
          ------------------------------------------------------------------------------
          ln_wage | Coefficient Std. err. t P>|t| [95% conf. interval]
          -------------+----------------------------------------------------------------
          age | .0181349 .0003477 52.16 0.000 .0174534 .0188164
          _cons | 1.148214 .0102579 111.93 0.000 1.128107 1.16832
          -------------+----------------------------------------------------------------
          sigma_u | .40635023
          sigma_e | .30349389
          rho | .64192015 (fraction of variance due to u_i)
          ------------------------------------------------------------------------------
          F test that all u_i=0: F(4709, 23799) = 8.81 Prob > F = 0.0000
          
          . di 4710*15
          70650
          
          . di 4710*6.1
          28731
          
          . xtsum
          
          Variable | Mean Std. dev. Min Max | Observations
          -----------------+--------------------------------------------+----------------
          idcode overall | 2601.284 1487.359 1 5159 | N = 28534
          between | 1487.57 1 5159 | n = 4711
          within | 0 2601.284 2601.284 | T-bar = 6.05689
          | |
          year overall | 77.95865 6.383879 68 88 | N = 28534
          between | 5.156521 68 88 | n = 4711
          within | 5.138271 63.79198 92.70865 | T-bar = 6.05689
          | |
          birth_yr overall | 48.08509 3.012837 41 54 | N = 28534
          between | 3.051795 41 54 | n = 4711
          within | 0 48.08509 48.08509 | T-bar = 6.05689
          | |
          age overall | 29.04511 6.700584 14 46 | N = 28510
          between | 5.485756 14 45 | n = 4710
          within | 5.16945 14.79511 43.79511 | T-bar = 6.05308
          | |
          race overall | 1.303392 .4822773 1 3 | N = 28534
          between | .4862111 1 3 | n = 4711
          within | 0 1.303392 1.303392 | T-bar = 6.05689
          | |
          msp overall | .6029175 .4893019 0 1 | N = 28518
          between | .3982385 0 1 | n = 4711
          within | .3238927 -.3304159 1.536251 | T-bar = 6.05349
          | |
          nev_mar overall | .2296795 .4206341 0 1 | N = 28518
          between | .3684416 0 1 | n = 4711
          within | .2456558 -.7036538 1.163013 | T-bar = 6.05349
          | |
          grade overall | 12.53259 2.323905 0 18 | N = 28532
          between | 2.566536 0 18 | n = 4709
          within | 0 12.53259 12.53259 | T-bar = 6.05904
          | |
          collgrad overall | .1680451 .3739129 0 1 | N = 28534
          between | .4045558 0 1 | n = 4711
          within | 0 .1680451 .1680451 | T-bar = 6.05689
          | |
          not_smsa overall | .2824441 .4501961 0 1 | N = 28526
          between | .4111053 0 1 | n = 4711
          within | .1834446 -.6461273 1.215777 | T-bar = 6.05519
          | |
          c_city overall | .357218 .4791882 0 1 | N = 28526
          between | .4271586 0 1 | n = 4711
          within | .2490022 -.5761154 1.290551 | T-bar = 6.05519
          | |
          south overall | .4095562 .4917605 0 1 | N = 28526
          between | .4667982 0 1 | n = 4711
          within | .1597932 -.5237771 1.34289 | T-bar = 6.05519
          | |
          ind_code overall | 7.692973 2.994025 1 12 | N = 28193
          between | 2.542844 1 12 | n = 4695
          within | 1.708429 -1.507027 17.12154 | T-bar = 6.0049
          | |
          occ_code overall | 4.777672 3.065435 1 13 | N = 28413
          between | 2.86512 1 13 | n = 4699
          within | 1.650248 -5.522328 15.44434 | T-bar = 6.04661
          | |
          union overall | .2344319 .4236542 0 1 | N = 19238
          between | .3341803 0 1 | n = 4150
          within | .2668622 -.6822348 1.151099 | T-bar = 4.63566
          | |
          wks_ue overall | 2.548095 7.294463 0 76 | N = 22830
          between | 5.181437 0 76 | n = 4645
          within | 6.054 -33.95191 64.38143 | T-bar = 4.91496
          | |
          ttl_exp overall | 6.215316 4.652117 0 28.88461 | N = 28534
          between | 3.724221 0 24.7062 | n = 4711
          within | 3.484133 -9.642671 20.38091 | T-bar = 6.05689
          | |
          tenure overall | 3.123836 3.751409 0 25.91667 | N = 28101
          between | 2.796519 0 21.16667 | n = 4699
          within | 2.659784 -14.27894 15.62384 | T-bar = 5.98021
          | |
          hours overall | 36.55956 9.869623 1 168 | N = 28467
          between | 7.846585 1 83.5 | n = 4710
          within | 7.520712 -2.154726 130.0596 | T-bar = 6.04395
          | |
          wks_work overall | 53.98933 29.03232 0 104 | N = 27831
          between | 20.64508 0 104 | n = 4686
          within | 23.96999 -18.43924 131.156 | T-bar = 5.93918
          | |
          ln_wage overall | 1.674907 .4780935 0 5.263916 | N = 28534
          between | .424569 0 3.912023 | n = 4711
          within | .29266 -.4077221 4.78367 | T-bar = 6.05689
          
          . xtdes
          
          idcode: 1, 2, ..., 5159 n = 4711
          year: 68, 69, ..., 88 T = 15
          Delta(year) = 1 unit
          Span(year) = 21 periods
          (idcode*year uniquely identifies each observation)
          
          Distribution of T_i: min 5% 25% 50% 75% 95% max
          1 1 3 5 9 13 15
          
          Freq. Percent Cum. | Pattern
          ---------------------------+-----------------------
          136 2.89 2.89 | 1....................
          114 2.42 5.31 | ....................1
          89 1.89 7.20 | .................1.11
          87 1.85 9.04 | ...................11
          86 1.83 10.87 | 111111.1.11.1.11.1.11
          61 1.29 12.16 | ..............11.1.11
          56 1.19 13.35 | 11...................
          54 1.15 14.50 | ...............1.1.11
          54 1.15 15.64 | .......1.11.1.11.1.11
          3974 84.36 100.00 | (other patterns)
          ---------------------------+-----------------------
          4711 100.00 | XXXXXX.X.XX.X.XX.X.XX
          
          .

          Comment


          • #6
            Stephen: A few issues here. First, assuming you did xtset id (or xtset id year), the first two commands result in the usual fixed effects estimator at the hospital level. So they're using the same observations, which means all singleton hospitals are dropped. The same is true for the reghdfe command because you are absorbing by id. As Carlo points out, once you include hospital fixed effects the county FEs are redundant. Have you tried the commands on your data? You should find the point estimates are the same; the standard errors differ by level of clustering.

            I'm surprised you're not including time effects in any of the estimation. That can seriously bias the estimated treatment effect if you truly have a panel data set. I assume you do because otherwise use FE wouldn't make sense.

            It's not clear that you should only cluster at the id level. If the policy is entirely or largely determined at the county level then you should cluster at the county level. The recent Abadie, Athey, Imbens, Wooldridge (2023, Quarterly Journal of Economics) paper covers a simple setting (not panel data) but the insights should carry over.

            If most of the variation in the policy is at the county level, and not the hospital, I would use

            Code:
            xtset id year
            xtreg y i.treat i.year, fe vce(cluster county)
            You might cluster at the id level if the standard errors above are "too large" to be useful. But you have to explain why.

            Comment


            • #7
              Jeff,

              Thanks for the reply.

              This all makes sense. I have the time fixed effects included, and I have a follow-up question on a different post.

              I will be sure to read your co-authored paper on QJE.



              Originally posted by Jeff Wooldridge View Post
              Stephen: A few issues here. First, assuming you did xtset id (or xtset id year), the first two commands result in the usual fixed effects estimator at the hospital level. So they're using the same observations, which means all singleton hospitals are dropped. The same is true for the reghdfe command because you are absorbing by id. As Carlo points out, once you include hospital fixed effects the county FEs are redundant. Have you tried the commands on your data? You should find the point estimates are the same; the standard errors differ by level of clustering.

              I'm surprised you're not including time effects in any of the estimation. That can seriously bias the estimated treatment effect if you truly have a panel data set. I assume you do because otherwise use FE wouldn't make sense.

              It's not clear that you should only cluster at the id level. If the policy is entirely or largely determined at the county level then you should cluster at the county level. The recent Abadie, Athey, Imbens, Wooldridge (2023, Quarterly Journal of Economics) paper covers a simple setting (not panel data) but the insights should carry over.

              If most of the variation in the policy is at the county level, and not the hospital, I would use

              Code:
              xtset id year
              xtreg y i.treat i.year, fe vce(cluster county)
              You might cluster at the id level if the standard errors above are "too large" to be useful. But you have to explain why.

              Comment

              Working...
              X