I am trying to run a diff in diff in which the exogenous shock are tsunamis. One of the response variables is production in a given country in a given year.
I could focus on one tsunami at a time, but then I only have one observation in the treatment group. Hence, I choose to use data for many tsunami's in different years for different countries. The control group in each case is also different and based on pre-treatment covariate balance. Several countries make up the control group for each treatment.
Now, I am not interested in the heterogeneous nature of different tsunamis. Rather, I am interested in the average impact of tsunamis, as if they were all the same treatment - so that we have a downward biased estimate of the impact of a large one. Now, given a tsunami in 1999 in country X, I may choose country Y as a control. But, given another tsunami in 1980 in country Y, country Y is actually the treatment group, and X may well be in the control. I never allow for overlapping shocks in the following sense: if X is treated in 1990, then it cannot be used as a control with respect to a treatment occurring in 1990-93 (a safe time window)
So, it seems to me that I would have to, for each treatment, include data on treated and control countries production in the dependent variable at the time periods around the treatment dates. The independent variables would be year fixed effects, country fixed effects, a dummy that equals 1 if the country is treated or not (which would be 1 for country X in 1990 but 0 for country X in 1980), a dummy that equals 1 if we are in the post-treatment period and the interaction of these two dummies.
This has the problem that some tsunamis have much larger magnitude than other, so treatment is quite heterogeneous. I could focus on subsample analysis using high/low intensity to circumvent this criticism. Another alternative would be to include one treatment dummy for each kind of treatment, a dummy that equals 1 for each post-treatment period associated with each treatment and one interaction of the two per treatment. But this would probably reduce degrees of freedom (even though the problem might not be too bad if I have many countries in the control or if I have more than 1 period before and after the shock date) and yield an unparsimonious model.
My question is whether my approach is problematic for some reason I am missing and whether there are is any recommended literature dealing with this?
Thanks in advance.
I could focus on one tsunami at a time, but then I only have one observation in the treatment group. Hence, I choose to use data for many tsunami's in different years for different countries. The control group in each case is also different and based on pre-treatment covariate balance. Several countries make up the control group for each treatment.
Now, I am not interested in the heterogeneous nature of different tsunamis. Rather, I am interested in the average impact of tsunamis, as if they were all the same treatment - so that we have a downward biased estimate of the impact of a large one. Now, given a tsunami in 1999 in country X, I may choose country Y as a control. But, given another tsunami in 1980 in country Y, country Y is actually the treatment group, and X may well be in the control. I never allow for overlapping shocks in the following sense: if X is treated in 1990, then it cannot be used as a control with respect to a treatment occurring in 1990-93 (a safe time window)
So, it seems to me that I would have to, for each treatment, include data on treated and control countries production in the dependent variable at the time periods around the treatment dates. The independent variables would be year fixed effects, country fixed effects, a dummy that equals 1 if the country is treated or not (which would be 1 for country X in 1990 but 0 for country X in 1980), a dummy that equals 1 if we are in the post-treatment period and the interaction of these two dummies.
This has the problem that some tsunamis have much larger magnitude than other, so treatment is quite heterogeneous. I could focus on subsample analysis using high/low intensity to circumvent this criticism. Another alternative would be to include one treatment dummy for each kind of treatment, a dummy that equals 1 for each post-treatment period associated with each treatment and one interaction of the two per treatment. But this would probably reduce degrees of freedom (even though the problem might not be too bad if I have many countries in the control or if I have more than 1 period before and after the shock date) and yield an unparsimonious model.
My question is whether my approach is problematic for some reason I am missing and whether there are is any recommended literature dealing with this?
Thanks in advance.
Comment