Difference-in-differences in Stata with a post-treatment effect by year

Yash Chaudhary

Join Date: Dec 2017

Posts: 9
#1

Difference-in-differences in Stata with a post-treatment effect by year

06 Dec 2017, 11:16

Hi everyone,

I wanted to estimate a difference-in-differences model using Stata looking at the effects of academy conversion of schools on school attainment levels. To do so, I have created a variable earlyconverters that takes the value of 1 for the group of schools that converted pre-2010 (as the treatment group) and 0 for schools that converted post-2010 as the control group. The time variable is afterconversion which takes the value 1 (for 0 to 3 years after conversion) and 0 (for 4 years till conversion upto conversion.) since the timing of conversion is distributed from 2006 to 2010. To estimate the D-i-D, I use the following model -

reg y i.earlyconverters#i.afterconversion i.year controls, robust

Doing so, the Stata output shows a significant term on 1.earlyconverters#1.afterconversion with the base as (0,0).
But running the regression as

reg y i.earlyconverters##i.afterconversion i.year controls, robust

gives me the same F-statistic, the same R-squared, identical co-efficients and t-statistics on all covariates apart from 1.earlyconverters#1.afterconversion which is now massively insignificant (with a p-value of 0.981). I assume that this is to do with a change in the default category in the 2 regressions but I am unable to figure out the precise reason.

Secondly, I want to extend the analysis to allow for variable post and pre-treatment effects by year as opposed to a single post-treatment effect, i.e. an estimate of earlyconverters#(4 years before conversion), earlyconverters#(3 years before conversion) all the way to 3 years after conversion.

The estimation I attempted was

reg y i.earlyconverters#i.treat_year i.year controls, robust

where treat_year takes the value of 0 for 4 years before conversion all the way to 7 which is 3 years after conversion.

Interpreting the output made me realize that all the co-efficient values and significance was c.f. the default category which in this case is (0,0) that is a school that converts to an academy post-2010 and is 4 years away from that conversion. I don't think that this is intuitive and what I want my regression to show is the effect of conversion at each yearly interval - Difference in outcomes for an academy 4 years prior to conversion as compared to a non-academy 4 years prior to conversion, difference in outcomes for an academy 1 year after the conversion as compared to a non-academy 1 year after the conversion for all 8 time periods - i.e. relative default categories for each time period.

I am struggling to come up with a regression that would get me this result and I would be very grateful if I could be pointed in the right direction,

Thank you,
Yash
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30160
#2

06 Dec 2017, 11:29

To estimate the D-i-D, I use the following model -

reg y i.earlyconverters#i.afterconversion i.year controls, robust

Doing so, the Stata output shows a significant term on 1.earlyconverters#1.afterconversion with the base as (0,0).
But running the regression as

reg y i.earlyconverters##i.afterconversion i.year controls, robust

gives me the same F-statistic, the same R-squared, identical co-efficients and t-statistics on all covariates apart from 1.earlyconverters#1.afterconversion which is now massively insignificant (with a p-value of 0.981).

This is confusing, because the two regressions you show here are absolutely identical.

From the sentence that followed, I infer that you somehow changed some basecategory(ies), but you don't show that or explain what you did. In any case, a change in the base category would produce exactly the kind of results you describe: coefficients of the affected terms change, but nothing else does. If you ran -predict- after each model, you would find that they give exactly the same predicted values in every observation.That's because it's the same model, just with a different parameterization. In particular, when you look at an interaction coefficient, you are just changing which differences are being highlighted in the regression output. To get a parameterization-invariant view of what your model is telling you, should look at the output of

Code:

margins earlyconverters#afterconversion margins earlyconverters, dydx(afterconversion)

The first of these will give you the expected values of y in each of the four combinations of earlyconverters and afterconversion. The second will show you the change in y (marginal effect) from before conversion to after in each group. Changing base categories won't alter any of these. And, as a bonus, they are directly and easily understandable, as opposed to the regression coefficients which have to be understood as either conditional or reflecting differences between subsets.

As for your second concern, following that regression have a look at:

Code:

margins treat_year, dydx(earlyconverters)

I take it you are not familiar with the -margins- command. It is one of the best things in all of Stata, and it is really an indispensible tool for understanding DID analyses.* The corresponding section of the manual is quite good, but it is inherently complicated. I recommend you read the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf for a quick, easy introduction to it.

Added: *or any other models with interaction terms.

Last edited by Clyde Schechter; 06 Dec 2017, 11:32.
Comment
Yash Chaudhary

Join Date: Dec 2017

Posts: 9
#3

06 Dec 2017, 11:48

Thank you so much for the quick reply Clyde. I had a few follow-up questions if you don't mind.
Firstly, as you said, both the regressions are absolutely identical, which is why I cannot understand why the estimated co-efficients are so different. I have not done anything manually to alter the base categories but the reason I assumed that was the case was because nothing else in the output changed. Running the command

margins earlyconverters, dydx(afterconversion)

gives me the same marginal effect of *0.earlyconverters#1.afterconversion as both the specifications but the value for 1.earlyconverters#1.afterconversion is now 0.02218 (p-value=0.465) as compared to 0.092 (p-value=0.001) in the first specification and -0.297 (p-value = 0.376) in the second specification.

*Running the same command but switching the objective functions -
margins afterconversion, dydx(earlyconverters)

gave me the same value for 1.earlyconverters#0.afterconversion but yet another value fo 1.earlyconverters#1.afterconversion

Regarding the second concern, I ran the margins command as you suggested and my output was 'not estimable' for all values of treat_year.

*Edits

Last edited by Yash Chaudhary; 06 Dec 2017, 12:04.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30160
#4

06 Dec 2017, 12:14

If you are running the exact same commands on the exact same data and getting different results then something is seriously wrong, and none of the outputs can be trusted. If you did not change the base categories, and you did not change the data, nor the commands, you should get the same results. I suggest you post here the exact commands you ran (don't skip anything in between) and the exact full output you are getting.

It is premature to look into your second question if the basic model is not yet nailed down.
Comment
Yash Chaudhary

Join Date: Dec 2017

Posts: 9
#5

06 Dec 2017, 12:20

*Column 1 - OLS with only yearly controls*
reg schstdks4_cappedpts i.year i.earlyconverters#i.afterconversion, robust

Linear regression Number of obs = 1,264
F(14, 1249) = 20.44
Prob > F = 0.0000
R-squared = 0.1500
Root MSE = .36319

-------------------------------------------------------------------------------------------------
| Robust
schstdks4_cappedpts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
year |
2003 | -.2799424 .2311619 -1.21 0.226 -.7334508 .173566
2004 | -.2039077 .214009 -0.95 0.341 -.6237646 .2159492
2005 | -.200673 .208557 -0.96 0.336 -.6098338 .2084878
2006 | -.1890817 .2066744 -0.91 0.360 -.594549 .2163857
2007 | -.1416356 .2068891 -0.68 0.494 -.5475241 .264253
2008 | -.0700349 .2070069 -0.34 0.735 -.4761545 .3360846
2009 | -.017034 .2090639 -0.08 0.935 -.4271891 .3931212
2010 | .0354931 .2118959 0.17 0.867 -.3802181 .4512043
2011 | .1322136 .211697 0.62 0.532 -.2831075 .5475346
2012 | .1377421 .2122765 0.65 0.517 -.2787157 .5541999
2013 | .13946 .2152425 0.65 0.517 -.2828169 .5617368
|
earlyconverters#afterconversion |
0 1 | .0639332 .0505085 1.27 0.206 -.0351578 .1630241
1 0 | .1236718 .0343608 3.60 0.000 .0562605 .191083
1 1 | .1886763 .040051 4.71 0.000 .1101016 .2672511
|
_cons | -.3158101 .2064661 -1.53 0.126 -.7208688 .0892485
-------------------------------------------------------------------------------------------------

.
. *Column 4 - OLS with other control variables*
. reg schstdks4_cappedpts i.year i.earlyconverters#i.afterconversion schks2_eng_exp schks2_eng_abv schks2_mat_exp schks2_mat_abv schks2_sci_exp schks2_sci_abv schfemale schfsm schsen schwhite schblack schasian, robust

Linear regression Number of obs = 1,264
F(26, 1237) = 91.72
Prob > F = 0.0000
R-squared = 0.6161
Root MSE = .24526

-------------------------------------------------------------------------------------------------
| Robust
schstdks4_cappedpts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
year |
2003 | -.103877 .1275629 -0.81 0.416 -.3541406 .1463866
2004 | -.1448208 .1180761 -1.23 0.220 -.3764724 .0868307
2005 | -.2384776 .1152587 -2.07 0.039 -.4646018 -.0123534
2006 | -.2014004 .1183321 -1.70 0.089 -.4335543 .0307534
2007 | -.1801432 .1166253 -1.54 0.123 -.4089485 .048662
2008 | -.1050502 .1172407 -0.90 0.370 -.3350628 .1249623
2009 | -.0631218 .1184605 -0.53 0.594 -.2955276 .169284
2010 | -.0228516 .1215843 -0.19 0.851 -.2613859 .2156827
2011 | .0204458 .1203059 0.17 0.865 -.2155804 .256472
2012 | .000116 .1219015 0.00 0.999 -.2390405 .2392726
2013 | .0026299 .1255981 0.02 0.983 -.2437789 .2490387
|
earlyconverters#afterconversion |
0 1 | .051855 .0368492 1.41 0.160 -.0204388 .1241489
1 0 | .0701312 .0216144 3.24 0.001 .0277263 .1125362
1 1 | .0923084 .0266726 3.46 0.001 .0399799 .1446369
|
schks2_eng_exp | .8312307 .1904158 4.37 0.000 .457657 1.204804
schks2_eng_abv | 1.481489 .2073127 7.15 0.000 1.074766 1.888213
schks2_mat_exp | .6847628 .1934076 3.54 0.000 .3053197 1.064206
schks2_mat_abv | .8014206 .2349212 3.41 0.001 .3405326 1.262309
schks2_sci_exp | -.2579983 .2804386 -0.92 0.358 -.8081862 .2921895
schks2_sci_abv | -.11619 .2360023 -0.49 0.623 -.5791991 .3468191
schfemale | .1705866 .0608823 2.80 0.005 .0511426 .2900306
schfsm | -.2299195 .0758098 -3.03 0.002 -.3786494 -.0811896
schsen | -.1429862 .063454 -2.25 0.024 -.2674755 -.0184968
schwhite | -.0256358 .1326983 -0.19 0.847 -.2859745 .2347029
schblack | .3329815 .1777665 1.87 0.061 -.0157757 .6817386
schasian | .1717201 .1668003 1.03 0.303 -.1555226 .4989629
_cons | -1.163177 .3101663 -3.75 0.000 -1.771687 -.5546668
-------------------------------------------------------------------------------------------------

.
end of do-file

. reg schstdks4_cappedpts i.year i.earlyconverters##i.afterconversion schks2_eng_exp schks2_eng_abv schks2_mat_exp schks2_mat_abv schks2_sci_exp schks2_sci_abv schfemale schfsm schsen schwhite schblack schasian, robust

Linear regression Number of obs = 1,264
F(26, 1237) = 91.72
Prob > F = 0.0000
R-squared = 0.6161
Root MSE = .24526

-------------------------------------------------------------------------------------------------
| Robust
schstdks4_cappedpts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------------------------+----------------------------------------------------------------
year |
2003 | -.103877 .1275629 -0.81 0.416 -.3541406 .1463866
2004 | -.1448208 .1180761 -1.23 0.220 -.3764724 .0868307
2005 | -.2384776 .1152587 -2.07 0.039 -.4646018 -.0123534
2006 | -.2014004 .1183321 -1.70 0.089 -.4335543 .0307534
2007 | -.1801432 .1166253 -1.54 0.123 -.4089485 .048662
2008 | -.1050502 .1172407 -0.90 0.370 -.3350628 .1249623
2009 | -.0631218 .1184605 -0.53 0.594 -.2955276 .169284
2010 | -.0228516 .1215843 -0.19 0.851 -.2613859 .2156827
2011 | .0204458 .1203059 0.17 0.865 -.2155804 .256472
2012 | .000116 .1219015 0.00 0.999 -.2390405 .2392726
2013 | .0026299 .1255981 0.02 0.983 -.2437789 .2490387
|
1.earlyconverters | .0701312 .0216144 3.24 0.001 .0277263 .1125362
1.afterconversion | .051855 .0368492 1.41 0.160 -.0204388 .1241489
|
earlyconverters#afterconversion |
1 1 | -.0296779 .0334932 -0.89 0.376 -.0953876 .0360319
|
schks2_eng_exp | .8312307 .1904158 4.37 0.000 .457657 1.204804
schks2_eng_abv | 1.481489 .2073127 7.15 0.000 1.074766 1.888213
schks2_mat_exp | .6847628 .1934076 3.54 0.000 .3053197 1.064206
schks2_mat_abv | .8014206 .2349212 3.41 0.001 .3405326 1.262309
schks2_sci_exp | -.2579983 .2804386 -0.92 0.358 -.8081862 .2921895
schks2_sci_abv | -.11619 .2360023 -0.49 0.623 -.5791991 .3468191
schfemale | .1705866 .0608823 2.80 0.005 .0511426 .2900306
schfsm | -.2299195 .0758098 -3.03 0.002 -.3786494 -.0811896
schsen | -.1429862 .063454 -2.25 0.024 -.2674755 -.0184968
schwhite | -.0256358 .1326983 -0.19 0.847 -.2859745 .2347029
schblack | .3329815 .1777665 1.87 0.061 -.0157757 .6817386
schasian | .1717201 .1668003 1.03 0.303 -.1555226 .4989629
_cons | -1.163177 .3101663 -3.75 0.000 -1.771687 -.5546668
-------------------------------------------------------------------------------------------------

. margins earlyconverters, dydx(afterconversion)

Average marginal effects Number of obs = 1,264
Model VCE : Robust

Expression : Linear prediction, predict()
dy/dx w.r.t. : 1.afterconversion

------------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
1.afterconversion |
earlyconverters |
0 | .051855 .0368492 1.41 0.160 -.0204388 .1241489
1 | .0221772 .0303451 0.73 0.465 -.0373563 .0817107
------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.

. margins afterconversion, dydx(earlyconverters)

Average marginal effects Number of obs = 1,264
Model VCE : Robust

Expression : Linear prediction, predict()
dy/dx w.r.t. : 1.earlyconverters

------------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------------+----------------------------------------------------------------
1.earlyconverters |
afterconversion |
0 | .0701312 .0216144 3.24 0.001 .0277263 .1125362
1 | .0404534 .0260016 1.56 0.120 -.0105587 .0914655
------------------------------------------------------------------------------------
Note: dy/dx for factor levels is the discrete change from the base level.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30160
#6

06 Dec 2017, 12:28

These are not not not not not the same. On one occasion you use i.earlyconverters#i.afterconversion, and in the other you use i.earlyconverters##i.afterconversion. The # and ## are not the same thing. And in most situations (including yours) only the analysis with ## is valid.

Read -help fvvarlist- to understand the difference between # and ##.
Comment
Yash Chaudhary

Join Date: Dec 2017

Posts: 9
#7

06 Dec 2017, 12:36

In my initial post I did mention that I used i.earlyconverters#i.afterconversion in the first regression and i.earlyconverters##i.afterconversion in the second and I got an identical R-squared, F-statistic and estimates for other covariates with only the estimate for 1.earlyconverters#1.afterconversion changing. My initial question was set out to ascertain the difference between the 2 specifications and try and understand why the results were so different.

Also, as you said that the margins command shouldn't depend on the parameterization, why is it that margins earlyconverters, dydx (afterconversion) and margins afterconversion, dydx(earlyconverters) give such different results?

Thank you
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30160
#8

06 Dec 2017, 13:17

I see you did say # and ## in the first post, I missed it. I'm sorry. For some reason it didn't catch my eye as it should have. Only in #3 did I perceive it.

These are not simply reparameterizations of each other. They are different model specifications altogether (and the one using only # is incorrect.) They are, in fact, two models in which one includes some variables not present in the other. There is no reason the results should be the same. The model with ## includes i.earlyconverters and i.afterconversion on their own in addition to their interaction term. The model with # contains only the interaction without the "main" effects. (I prefer to call them constituent effects, because the term "main effect" leads people to misinterpret what they actually mean.) Do read -help fvvarlist- for more information about this.

Also, as you said that the margins command shouldn't depend on the parameterization, why is it that margins earlyconverters, dydx (afterconversion) and margins afterconversion, dydx(earlyconverters) give such different results?

These are not different parameterizations of each other, either. These are different questions, so they have different answers. The first asks: in each category of early converters, how much do things change after conversion vs before. The second asks: before (and then, too, after) conversion, how much difference is there between earlyconverters and controls.
Comment

Yash Chaudhary

Join Date: Dec 2017
Posts: 9

06 Dec 2017, 13:54

That makes a lot more sense now. Thank you so much for the clarification.
With regards to the second concern about yearly effects, I know you have recommended using margins but if I were to use a variant of this difference-in-difference specification, what would you suggest the code to be. Extending your suggestion to replace afterconversion with treat_year I get:

Code:

reg schstdks4_cappedpts i.year i.earlyconverters##i.treat_year schks2_eng_exp schks2_eng_abv schks2_mat_exp schks2_mat_abv schks2_sci_exp schks2_sci_abv schfemale schfsm schsen schwhite schblack schasian, robust baselevels
note: 1.earlyconverters#7.treat_year omitted because of collinearity

Linear regression                               Number of obs     =      1,264
                                                F(37, 1226)       =      66.80
                                                Prob > F          =     0.0000
                                                R-squared         =     0.6212
                                                Root MSE          =     .24471

--------------------------------------------------------------------------------------------
                           |               Robust
       schstdks4_cappedpts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
---------------------------+----------------------------------------------------------------
                      year |
                     2002  |          0  (base)
                     2003  |  -.1024338   .1266326    -0.81   0.419    -.3508745    .1460069
                     2004  |  -.1475457    .120105    -1.23   0.220    -.3831798    .0880884
                     2005  |  -.2397768    .117702    -2.04   0.042    -.4706965   -.0088572
                     2006  |  -.2032563   .1270421    -1.60   0.110    -.4525002    .0459877
                     2007  |  -.1839323   .1267776    -1.45   0.147    -.4326574    .0647929
                     2008  |  -.1312449     .12989    -1.01   0.312    -.3860763    .1235865
                     2009  |  -.1303111   .1318537    -0.99   0.323    -.3889951    .1283728
                     2010  |  -.1185822   .1347214    -0.88   0.379    -.3828922    .1457278
                     2011  |  -.1446667   .1368345    -1.06   0.291    -.4131223     .123789
                     2012  |  -.1982916   .1411805    -1.40   0.160    -.4752738    .0786905
                     2013  |  -.1482692   .1624127    -0.91   0.361    -.4669068    .1703684
                           |
           earlyconverters |
                        0  |          0  (base)
                        1  |   .0643754   .0519852     1.24   0.216    -.0376145    .1663653
                           |
                treat_year |
                        0  |          0  (base)
                        1  |  -.0128218    .052473    -0.24   0.807    -.1157686    .0901249
                        2  |   .0284751   .0601388     0.47   0.636    -.0895112    .1464613
                        3  |    .080384   .0667417     1.20   0.229    -.0505566    .2113245
                        4  |   .1217063   .0741043     1.64   0.101     -.023679    .2670916
                        5  |   .2249027   .0753269     2.99   0.003     .0771188    .3726866
                        6  |   .2663198   .0857917     3.10   0.002     .0980049    .4346346
                        7  |   .2025586   .0760964     2.66   0.008     .0532649    .3518522
                           |
earlyconverters#treat_year |
                      1 1  |   .0216966    .071522     0.30   0.762    -.1186224    .1620156
                      1 2  |  -.0051781   .0718663    -0.07   0.943    -.1461727    .1358166
                      1 3  |  -.0729474   .0764229    -0.95   0.340    -.2228815    .0769867
                      1 4  |   -.098835   .0793353    -1.25   0.213    -.2544829     .056813
                      1 5  |  -.1309482   .0759802    -1.72   0.085    -.2800138    .0181175
                      1 6  |   -.092823   .0819557    -1.13   0.258     -.253612     .067966
                      1 7  |          0  (omitted)
                           |
            schks2_eng_exp |   .8415962    .190485     4.42   0.000     .4678835    1.215309
            schks2_eng_abv |   1.448585   .2060547     7.03   0.000     1.044326    1.852844
            schks2_mat_exp |   .7042083   .1957173     3.60   0.000     .3202304    1.088186
            schks2_mat_abv |   .8362858    .239637     3.49   0.001     .3661418     1.30643
            schks2_sci_exp |  -.3029916   .2824497    -1.07   0.284    -.8571299    .2511468
            schks2_sci_abv |  -.1603757   .2408857    -0.67   0.506    -.6329696    .3122181
                 schfemale |   .1685228     .06051     2.79   0.005     .0498082    .2872374
                    schfsm |  -.2545218   .0767447    -3.32   0.001    -.4050872   -.1039564
                    schsen |  -.1492971    .062841    -2.38   0.018    -.2725849   -.0260094
                  schwhite |  -.0063786   .1322071    -0.05   0.962    -.2657558    .2529985
                  schblack |   .3288026   .1771433     1.86   0.064    -.0187349    .6763401
                  schasian |   .1924931   .1663859     1.16   0.248    -.1339396    .5189257
                     _cons |  -1.149123   .3028266    -3.79   0.000    -1.743239    -.555007
--------------------------------------------------------------------------------------------

How do I interpret the co-efficients on 1.earlyconverters#i.treat_year and why is (1,7) omitted?
Thank you

.

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30160
#10

06 Dec 2017, 17:34

The reason 1.earlyconverters#7.treat_year is omitted is because there is a colinearity among earlyconverters, i.treat_year, earlyconverters#treat_year and i.year. So you can't have them all. Because of the order in which you listed them in the regress command, Stata chose to omit 1.earlyconverters#7.treat_year, as it came last. If you want to keep that one in, you have to get rid of something else. You could either explicitly omit one of the year indicators, by specifying, say, i(2002/2012).year (explicitly omitting 2013.year), or you could just put i.year after i.earlyconverters##treat_year in the -regress- command, and Stata will eliminate a year indicator (probably 2013.)

In principle, there is no need to use -margins-; you can get those differences as linear combinations of the coefficients using -lincom-. But it's tedious and error prone, so I really recommend against it in practice. The reason you are getting "not estimable" results is probably that you have some combination of year, earlyconverters, and treat_year for which there are no observations, or at least none that don't get omitted due to missing values of some other variable. You can force -margins- to give you an answer by adding the -noestimcheck- option to the command. I think that is better than figuring out the -lincom-s.
Comment
Yash Chaudhary

Join Date: Dec 2017

Posts: 9
#11

06 Dec 2017, 18:08

Thank you so much for your guidance Clyde. I understand where I was going wrong and can now proceed with my work having corrected those mistakes. I am extremely grateful for your help.
Comment

Announcement