Hi everyone,
I wanted to estimate a difference-in-differences model using Stata looking at the effects of academy conversion of schools on school attainment levels. To do so, I have created a variable earlyconverters that takes the value of 1 for the group of schools that converted pre-2010 (as the treatment group) and 0 for schools that converted post-2010 as the control group. The time variable is afterconversion which takes the value 1 (for 0 to 3 years after conversion) and 0 (for 4 years till conversion upto conversion.) since the timing of conversion is distributed from 2006 to 2010. To estimate the D-i-D, I use the following model -
reg y i.earlyconverters#i.afterconversion i.year controls, robust
Doing so, the Stata output shows a significant term on 1.earlyconverters#1.afterconversion with the base as (0,0).
But running the regression as
reg y i.earlyconverters##i.afterconversion i.year controls, robust
gives me the same F-statistic, the same R-squared, identical co-efficients and t-statistics on all covariates apart from 1.earlyconverters#1.afterconversion which is now massively insignificant (with a p-value of 0.981). I assume that this is to do with a change in the default category in the 2 regressions but I am unable to figure out the precise reason.
Secondly, I want to extend the analysis to allow for variable post and pre-treatment effects by year as opposed to a single post-treatment effect, i.e. an estimate of earlyconverters#(4 years before conversion), earlyconverters#(3 years before conversion) all the way to 3 years after conversion.
The estimation I attempted was
reg y i.earlyconverters#i.treat_year i.year controls, robust
where treat_year takes the value of 0 for 4 years before conversion all the way to 7 which is 3 years after conversion.
Interpreting the output made me realize that all the co-efficient values and significance was c.f. the default category which in this case is (0,0) that is a school that converts to an academy post-2010 and is 4 years away from that conversion. I don't think that this is intuitive and what I want my regression to show is the effect of conversion at each yearly interval - Difference in outcomes for an academy 4 years prior to conversion as compared to a non-academy 4 years prior to conversion, difference in outcomes for an academy 1 year after the conversion as compared to a non-academy 1 year after the conversion for all 8 time periods - i.e. relative default categories for each time period.
I am struggling to come up with a regression that would get me this result and I would be very grateful if I could be pointed in the right direction,
Thank you,
Yash
I wanted to estimate a difference-in-differences model using Stata looking at the effects of academy conversion of schools on school attainment levels. To do so, I have created a variable earlyconverters that takes the value of 1 for the group of schools that converted pre-2010 (as the treatment group) and 0 for schools that converted post-2010 as the control group. The time variable is afterconversion which takes the value 1 (for 0 to 3 years after conversion) and 0 (for 4 years till conversion upto conversion.) since the timing of conversion is distributed from 2006 to 2010. To estimate the D-i-D, I use the following model -
reg y i.earlyconverters#i.afterconversion i.year controls, robust
Doing so, the Stata output shows a significant term on 1.earlyconverters#1.afterconversion with the base as (0,0).
But running the regression as
reg y i.earlyconverters##i.afterconversion i.year controls, robust
gives me the same F-statistic, the same R-squared, identical co-efficients and t-statistics on all covariates apart from 1.earlyconverters#1.afterconversion which is now massively insignificant (with a p-value of 0.981). I assume that this is to do with a change in the default category in the 2 regressions but I am unable to figure out the precise reason.
Secondly, I want to extend the analysis to allow for variable post and pre-treatment effects by year as opposed to a single post-treatment effect, i.e. an estimate of earlyconverters#(4 years before conversion), earlyconverters#(3 years before conversion) all the way to 3 years after conversion.
The estimation I attempted was
reg y i.earlyconverters#i.treat_year i.year controls, robust
where treat_year takes the value of 0 for 4 years before conversion all the way to 7 which is 3 years after conversion.
Interpreting the output made me realize that all the co-efficient values and significance was c.f. the default category which in this case is (0,0) that is a school that converts to an academy post-2010 and is 4 years away from that conversion. I don't think that this is intuitive and what I want my regression to show is the effect of conversion at each yearly interval - Difference in outcomes for an academy 4 years prior to conversion as compared to a non-academy 4 years prior to conversion, difference in outcomes for an academy 1 year after the conversion as compared to a non-academy 1 year after the conversion for all 8 time periods - i.e. relative default categories for each time period.
I am struggling to come up with a regression that would get me this result and I would be very grateful if I could be pointed in the right direction,
Thank you,
Yash
Comment