Hi all,
I am using the yearly American Community Survey files to estimate the impact of a policy on wages in five states (3 treatment, 2 counterfactual). In each of these states, the observations are given at the individual level, and the number of members in each state is large, at approximately 70,000 or more. I am using a difference-in-differences model such that:
Log_Wages = Policy X Year Individual_Level_Covariates State_Level_Covariates State_Fixed_Effects Time_Fixed_Effects
I need to account for the clustered nature of the data, but understand that using cluster-robust standard errors (the cluster command, in Stata) with only 5 groups will bias my standard errors downward. I was pointed to the following Donald and Lang paper as an alternative clustering method that accounts for small numbers of groups G, with high numbers of group members n:
Stephen G. Donald & Kevin Lang, 2007. "Inference with Difference-in-Differences and Other Panel Data," The Review of Economics and Statistics, MIT Press, vol. 89(2), pages 221-233, May.
I've read the paper multiple times, but am not sure I understand what their method is. I believe their 2-step process in consists of first deriving the difference-in-differences (DID) estimator (without clustering), then regressing the DID estimator on a dummy variable for the policy implementation year, using OLS and a t-distribution for a parameter estimate. Is this correct? I would appreciate any clarification, or an applied example of how this method of clustering might be implemented step by step. I'm also happy to provide more detail on the data I'm currently working on, if that can be helpful.
Thank you!
Claire Cahen
I am using the yearly American Community Survey files to estimate the impact of a policy on wages in five states (3 treatment, 2 counterfactual). In each of these states, the observations are given at the individual level, and the number of members in each state is large, at approximately 70,000 or more. I am using a difference-in-differences model such that:
Log_Wages = Policy X Year Individual_Level_Covariates State_Level_Covariates State_Fixed_Effects Time_Fixed_Effects
I need to account for the clustered nature of the data, but understand that using cluster-robust standard errors (the cluster command, in Stata) with only 5 groups will bias my standard errors downward. I was pointed to the following Donald and Lang paper as an alternative clustering method that accounts for small numbers of groups G, with high numbers of group members n:
Stephen G. Donald & Kevin Lang, 2007. "Inference with Difference-in-Differences and Other Panel Data," The Review of Economics and Statistics, MIT Press, vol. 89(2), pages 221-233, May.
I've read the paper multiple times, but am not sure I understand what their method is. I believe their 2-step process in consists of first deriving the difference-in-differences (DID) estimator (without clustering), then regressing the DID estimator on a dummy variable for the policy implementation year, using OLS and a t-distribution for a parameter estimate. Is this correct? I would appreciate any clarification, or an applied example of how this method of clustering might be implemented step by step. I'm also happy to provide more detail on the data I'm currently working on, if that can be helpful.
Thank you!
Claire Cahen
Comment