cluster-invariant independent variable with fixed effects

Maria Nolan

Join Date: Mar 2017

Posts: 4
#1

cluster-invariant independent variable with fixed effects

29 Mar 2017, 15:17

Hi everyone,

I am estimating linear models where the unit of analysis is the county, and counties are nested in states. There is no time component, and the number of counties in each state varies widely. The main independent variable is a dichotomous variable that varies within some of the states but not others. In other words, some states have “ones” or “zeros” in all of their counties, whereas in others the independent variable does vary across counties.

I might be getting this wrong, but my understanding is that if I run a fixed-effects model to account for state-level unobservables, all counties belonging to states where the independent variable of interest does not vary should be dropped from the analysis, as the state fixed-effect and the independent variable are perfectly collinear in those cases.

However, when I run the models in Stata this is not the case. I start with the pooled estimator ignoring the nesting of counties in states:

reg y x, robust

Now, the number of observations remains the same if I do either:

xtset state
xtreg y x, fe vce(cluster state)

or

reg y x i.state, vce(cluster state)

In the latter case, the output gives me coefficients for all states (except of course for the reference category), in other words none are omitted. Shouldn’t some of them be unidentifiable?

I am confused why nothing is being dropped out in the fixed effect model even if the independent variable is invariant for some of the clusters. Is it appropriate to run a fixed effect model with this kind of data, or should I just treat it as a cross-section perhaps with std errors clustered by state?

Thank you.

Last edited by Maria Nolan; 29 Mar 2017, 16:16.
Tags: fixed effects, linear models
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17084
#2

30 Mar 2017, 01:15

Maria:
welcome to the list.
I'm not clear why you're using -xtreg,fe- if your data have no panel structure.
That said, please post what Stata gave you back, too (as per FAQ). Thanks.

Kind regards,
Carlo
(Stata 18.0 SE)
Comment

Maria Nolan

Join Date: Mar 2017
Posts: 4

30 Mar 2017, 09:51

Carlo,

Thank you very much for your response. As you point out, I do not have a panel structure since there is no time involved. I am working with hierarchically structured data, with counties are nested in states. Substantively, I am only interested in county-level covariates, but as a robustness check I would like to control for state-level heterogeneity by treating the states as fixed effects.

Ignoring the hierarchical structure of the data gives me this:

Code:

reg outcomevar i.treatedcounty, robust

Linear regression                               Number of obs     =      2,228
                                                F(1, 2226)        =      24.75
                                                Prob > F          =     0.0000
                                                R-squared         =     0.0091
                                                Root MSE          =     3.6775

-------------------------------------------------------------------------------
              |               Robust
   outcomevar |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
treatedcounty |
     treated  |  -.8029753   .1613879    -4.98   0.000    -1.119462   -.4864888
        _cons |   6.897014   .0948097    72.75   0.000     6.711089    7.082939
-------------------------------------------------------------------------------

Now if I add state dummies and clustered errors this is what I get:

Code:

reg outcomevar i.treatedcounty i.statecode, vce(cluster statecode) base

Linear regression                               Number of obs     =      2,228
                                                F(0, 30)          =          .
                                                Prob > F          =          .
                                                R-squared         =     0.1947
                                                Root MSE          =     3.3379

                              (Std. Err. adjusted for 31 clusters in statecode)
-------------------------------------------------------------------------------
              |               Robust
   outcomevar |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
treatedcounty |
     control  |          0  (base)
     treated  |  -.0865117   .1314476    -0.66   0.515    -.3549635    .1819402
              |
    statecode |
           1  |          0  (base)
           2  |  -3.436622   .0563347   -61.00   0.000    -3.551673   -3.321572
           3  |  -.5861505   .0563347   -10.40   0.000    -.7012013   -.4710997
           4  |  -1.645042   .0563347   -29.20   0.000    -1.760093   -1.529991
           5  |  -3.496759   .0421241   -83.01   0.000    -3.582788    -3.41073
           6  |  -5.885945    .058682  -100.30   0.000    -6.005789     -5.7661
           7  |  -6.067232   .0563347  -107.70   0.000    -6.182283   -5.952181
           8  |  -2.556956   .0563347   -45.39   0.000    -2.672007   -2.441905
           9  |  -3.826613    .024361  -157.08   0.000    -3.876365   -3.776862
          10  |  -7.198257   .0605076  -118.96   0.000     -7.32183   -7.074684
          11  |   -7.83342   .0018778 -4171.54   0.000    -7.837255   -7.829585
          12  |  -3.674835   .0439005   -83.71   0.000    -3.764491   -3.585178
          13  |  -5.189244   .0606314   -85.59   0.000     -5.31307   -5.065418
          14  |  -3.929479   .0331381  -118.58   0.000    -3.997156   -3.861803
          15  |  -6.057091   .0308632  -196.26   0.000    -6.120122    -5.99406
          16  |  -4.330975   .0037556 -1153.19   0.000    -4.338645   -4.323305
          17  |  -4.237297   .0166918  -253.86   0.000    -4.271386   -4.203208
          18  |   -1.08158   .0563347   -19.20   0.000    -1.196631   -.9665293
          19  |  -5.002663   .0216803  -230.75   0.000    -5.046941   -4.958386
          20  |  -6.748452   .0402259  -167.76   0.000    -6.830604   -6.666299
          21  |  -5.949709    .042251  -140.82   0.000    -6.035997   -5.863421
          22  |    .237328   .0563347     4.21   0.000     .1222772    .3523788
          23  |  -5.576166   .0434477  -128.34   0.000    -5.664898   -5.487434
          24  |  -2.050062   .0093891  -218.34   0.000    -2.069237   -2.030887
          25  |  -2.282927   .0563347   -40.52   0.000    -2.397977   -2.167876
          26  |   .6615083   .0176736    37.43   0.000      .625414    .6976027
          27  |  -2.760879   .0563347   -49.01   0.000     -2.87593   -2.645828
          28  |  -4.950849   .0279136  -177.36   0.000    -5.007857   -4.893842
          29  |  -6.035505   .0431899  -139.74   0.000     -6.12371   -5.947299
          30  |  -2.358259   .0563347   -41.86   0.000     -2.47331   -2.243208
          31  |  -5.282389    .042251  -125.02   0.000    -5.368677   -5.196101
              |
        _cons |    11.5765   .0563347   205.49   0.000     11.46144    11.69155
-------------------------------------------------------------------------------

I use xtreg, fe just to get a shorter output since I am not interested in interpreting the state effects. Isn't that equivalent to manually introducing the state dummies as above? I always thought it was. Essentially I want to remove all variation between states and estimate the parameter of interest relying on the within-state variation only. I noticed there is a slight difference in the standard error when I do xtreg, fe vs including i.statecode as predictor (maybe a degrees of freedom issue?), but overall Stata does seem to be doing the same thing:

Code:

xtset statecode
       panel variable:  statecode (unbalanced)

xtreg outcomevar i.treatedcounty, fe vce(cluster statecode) base

Fixed-effects (within) regression               Number of obs     =      2,228
Group variable: statecode                       Number of groups  =         31

R-sq:                                           Obs per group:
     within  = 0.0001                                         min =          3
     between = 0.2221                                         avg =       71.9
     overall = 0.0091                                         max =        550

                                                F(1,30)           =       0.44
corr(u_i, Xb)  = 0.1941                         Prob > F          =     0.5126

                              (Std. Err. adjusted for 31 clusters in statecode)
-------------------------------------------------------------------------------
              |               Robust
   outcomevar |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
treatedcounty |
     control  |          0  (base)
     treated  |  -.0865117   .1305589    -0.66   0.513    -.3531484    .1801251
              |
        _cons |   6.709859   .0341047   196.74   0.000     6.640208     6.77951
--------------+----------------------------------------------------------------
      sigma_u |  2.2694765
      sigma_e |  3.3379468
          rho |   .3161302   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

So as you can see I have 2,228 observations (counties) nested in 31 clusters (states). There is within-state variation in the relevant county-level independent variable in 21 of the 31 clusters. The other ten have ones or zeros in all of their units. When I introduce state fixed effects, I assume the coefficient on "treatedcounty" is being identified only using the observations in those 21 states?

I am not sure fixed effects is the way to go here since I am throwing away a lot of information, but on the other hand there is the concern of controlling for potential unobserved state effects. Independent from this modeling decision however, I am puzzled as to whether there is any difference between restricting the sample to those 21 states versus running the analysis with the entire sample of counties and fixed effects?

Thank you again!

Last edited by Maria Nolan; 30 Mar 2017, 09:59.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 28611
#4

30 Mar 2017, 10:00

I use xtreg, fe just to get a shorter output since I am not interested in interpreting the state effects. Isn't that equivalent to manually introducing the state dummies as above? I always thought it was. Essentially I want to remove all variation between states and estimate the parameter of interest relying on the within-state variation only. I noticed there is a slight difference in the standard error when I do xtreg, fe vs including i.statecode as predictor (maybe a degrees of freedom issue?), but overall

Correct.

The other ten have ones or zeros in all of their units. When I introduce state fixed effects, I assume the coefficient on "treatedcounty" is being identified only using the observations in those 21 states?

Not true. A variable which is constant within county in every county is colinear with the fixed effects and gets omitted. But when it is constant within some counties, that is not a colinearity problem and the variable is retained. The observations in those counties are also retained. And they matter. If you drop those counties from the analysis, the results will change, and they will most likely be wrong due to selection bias from excluding those counties!

Yes, you need the fixed effects here. If there are state-level influences on the outcome variable that apply to all the counties within a state but not to counties in the other states, then you have non-independent observations at the county level. This makes the OLS results invalid: independence of observations is a critical assumption there. You must use -xtreg- for this. Ordinarily, I would hedge this warning by saying that you could ignore this if there are actually no state-level effects on the variable. But the results themselves clearly show that they are: your intrastate correlation estimate came out rho = 0.316. That is a level of non-independence of observations that is far too large to simply ignore.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17084
#5

30 Mar 2017, 10:48

Maria:
thanks for futher clarifications.
Clyde gave, as usual, excellent and comprehensive advice.
I have an aside only: in you first regression (pooled OLS) you robustified the standard errors (SEs): that takes heteroskedasticity into account, but not not the violation of observations Independence (OLS prerequisite): for that, -cluster()- option comes in handy.
It also Worth noting that the above mentioned difference does not hold, say, for -xtreg-, where the two SE option do the same job.

Kind regards,
Carlo
(Stata 18.0 SE)
Comment
Maria Nolan

Join Date: Mar 2017

Posts: 4
#6

30 Mar 2017, 15:29

Thank you so much, Clyde and Carlo.

Clyde, when you said

A variable which is constant within county in every county is colinear with the fixed effects and gets omitted. But when it is constant within some counties, that is not a colinearity problem and the variable is retained.

You meant a county-level variable which is constant in every state would be colinear with the fixed effects and be omitted, but when it is constant within some states only, there is no colinearity problem?

This clarifies the issue. Thanks again for your generous help!
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 28611
#7

30 Mar 2017, 16:58

You meant a county-level variable which is constant in every state would be colinear with the fixed effects and be omitted, but when it is constant within some states only, there is no colinearity problem?

Yes. Sorry for the error.
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 1891
#8

30 Mar 2017, 21:54

I think there is still a bit of confusion here. Clyde is correct that a variable will only drop out if it has no within-state variation for any state. And, of course, if this were the case, Stata would not give you a coefficient estimate on treatedcounty if this variable had no within-state variation.

Having said that, Maria is also correct in thinking that identification is purely off of the states with some variation across counties within the state. Without any other controls that have some within-state variation, the FE estimate using all of the states will be identical to dropping the 10 states without any variation in treatedcounty.

To see this, do the following:

Code:

egen totaltreat = sum(treatedcounty), by(statecoed) xtreg outcomevar treatedcounty if totaltreat > 0, fe

You should get an identical answer on the treatedcounty variable as using all of the states.

Let me also say that I suspect fixed effects will be the only convincing analysis here unless you have some sort or random assignment to treatment and control. Fixed effects is doing what it should: it accounts for state-level differences and identifies the effect off of states that have some variation in treatment status. This is not a bad thing.

In your particular application, you don't get a significant result. The standard error is large relative to the coefficient estimate. That's the way it goes. You might try putting in state-level controls -- as many as you can find -- and use a random effects analysis. But that will always be less robust than fixed effects.

I hope this helps.

Jeff
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 28611
#9

30 Mar 2017, 23:12

Jeff is right in what he says. My statement was accurate, but unclearly worded. The coefficient estimates will be the same whether you include or exclude the counties for which the variable is constant. But the standard errors will be different (and, hence, the confidence intervals and p-values will differ), as will the estimates of the variance components. See the following example:

Code:

set more off clear set obs 10 set seed 1234 gen id = _n gen fe = rnormal(0, 1) expand 10 gen x = rnormal(0, 1) gen y = 2.5 + fe + 7*x + rnormal(0, 1) replace x = 0.5 if id == 9 replace x = -0.5 if id == 10 xtset id xtreg y x, fe xtreg y x if id <= 8, fe
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 1891
#10

31 Mar 2017, 08:08

I agree with Clyde that the standard errors can be a bit different, but that, I think, is due to conventions with degrees-of-freedom conventions. A quirk in the above code causes bigger differences than one would see in practice. Notice that the x is set to constant for ids 9 and 10 after y was generated. I would have generated y after my x variable was generated; otherwise, it introduces a serious misspecification. Now the result on dropping ids 9 and 10 for beta hat is algebraic. But the above way of generating the data exaggerates the likely differences. The small cross sectional size likely has something to do with it, too.

Code:

set more off clear set obs 10 set seed 1234 gen id = _n gen fe = rnormal(0, 1) expand 10 gen x = rnormal(0, 1) replace x = 0.5 if id == 9 replace x = -0.5 if id == 10 gen y = 2.5 + fe + 7*x + rnormal(0, 1) xtset id xtreg y x, fe xtreg y x if id <= 8, fe
Comment

Maria Nolan

Join Date: Mar 2017
Posts: 4

#11

01 Apr 2017, 13:25

Thank you very much, Jeff and Clyde, for an informative exchange and the time you dedicate to this forum.

I just have one remaining question: are state clustered errors appropriate in this type of setting, with 20 to 30 clusters? I understand the need to account for potential within-cluster correlation in the error, but I want to make sure the number of groups is large enough, given the asymptotic assumption for cluster-robust inference.

I am analyzing several outcome variables with this hierarchically structured data. With some of my outcomes and always introducing state fixed effects, clustering the errors versus only accounting for heteroskedasticity (by manually introducing the state dummies in Stata with i.statecode and specifying the robust option) can lead to different inferences. The difference in the errors is typically not too large, but in some cases can be large enough to make the difference for statistical significance at conventional levels. I assume the best approach here is to be conservative and opt for the clustered errors, but I wonder if you have any particular suggestions on this.

As for the results when states without any variation in the independent variable are dropped or not, the FE estimates are indeed identical, and there is a very small difference in the standard error (as long as no other county-level controls are included, of course).

With the full sample:

Code:

xtset statecode
xtreg outcomevar treatedcounty, fe vce(cluster statecode)

Fixed-effects (within) regression               Number of obs     =      2,228
Group variable: statecode                       Number of groups  =         31

R-sq:                                           Obs per group:
     within  = 0.0001                                         min =          3
     between = 0.2221                                         avg =       71.9
     overall = 0.0091                                         max =        550

                                                F(1,30)           =       0.44
corr(u_i, Xb)  = 0.1941                         Prob > F          =     0.5126

                              (Std. Err. adjusted for 31 clusters in statecode)
-------------------------------------------------------------------------------
              |               Robust
   outcomevar |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
treatedcounty |  -.0865117   .1305589    -0.66   0.513    -.3531484    .1801251
        _cons |   6.709859   .0341047   196.74   0.000     6.640208     6.77951
--------------+----------------------------------------------------------------
      sigma_u |  2.2694765
      sigma_e |  3.3379468
          rho |   .3161302   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

Dropping states without county-level variation in the independent variable:

Code:

egen totaltreat = sum(treatedcounty), by(statecode)

xtreg outcomevar treatedcounty if totaltreat > 0, fe vce(cluster statecode)

Fixed-effects (within) regression               Number of obs     =      1,797
Group variable: statecode                       Number of groups  =         21

R-sq:                                           Obs per group:
     within  = 0.0001                                         min =          7
     between = 0.0651                                         avg =       85.6
     overall = 0.0011                                         max =        550

                                                F(1,20)           =       0.43
corr(u_i, Xb)  = 0.0613                         Prob > F          =     0.5185

                              (Std. Err. adjusted for 21 clusters in statecode)
-------------------------------------------------------------------------------
              |               Robust
   outcomevar |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
treatedcounty |  -.0865117   .1316146    -0.66   0.518     -.361055    .1880317
        _cons |   6.293216   .0426264   147.64   0.000     6.204299    6.382134
--------------+----------------------------------------------------------------
      sigma_u |  2.1237033
      sigma_e |  3.4129803
          rho |  .27911636   (fraction of variance due to u_i)
-------------------------------------------------------------------------------

Thanks again!

Comment

Jeff Wooldridge

Join Date: Apr 2014

Posts: 1891
#12

02 Apr 2017, 21:22

Hi Maria:

Clustering with only 31 clusters, when some of the cluster sizes are really large, is a stretch. It could work okay. I might bootstrap just to see that the results are similar. This doesn't prove it's okay, but hopefully the cluster bootstrap ses are similar.
Comment

Announcement