Clustering standard errors while also including fixed effects on same level

Valentina Cardenas

Join Date: Apr 2020

Posts: 4
#1

Clustering standard errors while also including fixed effects on same level

02 May 2020, 15:41

Hi,

If my model includes fixed effects on the level i am also clustering on, is one of these redundant? (the level is year of birth/cohort)

A second, independent question on clustering: I am using cohorts as my time variable in a diff in diff and I observe individuals' education when they are beyond school completion age. I want to cluster at the village level, but my village variable refers to the village of residence *when surveyed*, while my outcome variable (education) was determined before this (when individuals were children), so that there is a chance that village of residence pertinent to their education was in fact different. Can I still cluster on village level using this potentially inaccurate village of residence variable?

Many Thanks.
Tags: clustering, diffindiff, fixed effects
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2175
#2

02 May 2020, 15:50

It can make logical sense to cluster at the same level that you include fixed effects. The first issue is: At what level should you cluster? You should not cluster simply because you observe the village of residence. Was there an intervention at the village level that affected education levels? Were the data obtained by a cluster sampling scheme with the villages being the clusters? These are the two important questions.

Often there is, essentially, random sampling. If so, the only issue is the level of the policy assignment. I suspect you're analyzing a policy done at the village level (even though you may not have the correct village). If so, your best hope is to cluster at the village level that you observe. And include fixed effects if you think the variation in education is related to unobserved village effects.

JW
1 like
Comment
Valentina Cardenas

Join Date: Apr 2020

Posts: 4
#3

03 May 2020, 03:26

Thank you very much for your helpful response Jeff.

I am using DHS data on Uganda which uses stratified 2-stage cluster design for sampling. So clusters (villages) are first selected non-randomly and secondly households within them are selected randomly.

Background: I am trying to evaluate the role model effect of a parliamentary gender quota (which abruptly boosted female representation from zero to 30 women) on womens' education. I use men as my control group. I use DiD with old versus young cohorts as my time variable to investigate whether girls from cohorts whose schooling was exposed to the quota go on to reach a higher a education level.

My DiD regression looks as follows:

Code:

reg educ female i.cohort i.post#i.female i.region i.region#i.cohort

(I have 14 cohorts, 9 regions and over 1000 villages)

Is this correct:
I would cluster by village to adjust for the fact that individuals from the same village experience the same unobserved village-level effects so their education levels would be correlated?

I would cluster by cohort to adjust for the fact that individuals from the same cohort experience the same unobserved cohort-level effects so their education levels would be correlated?

Regarding the cohort case, since I already include cohort fixed effects, I do not understand the difference between clustering by cohort and adding cohort fixed effects (i.cohort).
Am I correct in saying cohort fixed effects adjust the DiD coefficient for cohort-level effects while clustering just adjusts the standard error of the DiD coefficient for cohort-level effects?

(I know the number of cohort clusters is suboptimal, but I want to at least test an alternative to village clustering due to the aforementioned issue of observed village data being different from during education years)

Secondary question about fixed effects:
I add region & region-cohort FE because a lot of governance elements and education budgets are decentralized, so region-level infrastructure and education spending could influence education. However, adding these barely changes my coefficient of interest.

You mention I should

include fixed effects if you think the variation in education is related to unobserved x-level effects

--> In a DiD do I add FE because variation in education between individuals in general is related to unobserved x-level effects or variation in education between my treatment and control groups (girls and boys) is related to unobserved x-level effects (i.e. because the education gender gap could be related to regional effects).

Thank you very much.

Valentina
Comment

Announcement

Clustering standard errors while also including fixed effects on same level

Comment

Comment