Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Difference-in-differences in Stata with a post-treatment effect by year

    Hi everyone,

    I wanted to estimate a difference-in-differences model using Stata looking at the effects of academy conversion of schools on school attainment levels. To do so, I have created a variable earlyconverters that takes the value of 1 for the group of schools that converted pre-2010 (as the treatment group) and 0 for schools that converted post-2010 as the control group. The time variable is afterconversion which takes the value 1 (for 0 to 3 years after conversion) and 0 (for 4 years till conversion upto conversion.) since the timing of conversion is distributed from 2006 to 2010. To estimate the D-i-D, I use the following model -

    reg y i.earlyconverters#i.afterconversion i.year controls, robust

    Doing so, the Stata output shows a significant term on 1.earlyconverters#1.afterconversion with the base as (0,0).
    But running the regression as

    reg y i.earlyconverters##i.afterconversion i.year controls, robust

    gives me the same F-statistic, the same R-squared, identical co-efficients and t-statistics on all covariates apart from 1.earlyconverters#1.afterconversion which is now massively insignificant (with a p-value of 0.981). I assume that this is to do with a change in the default category in the 2 regressions but I am unable to figure out the precise reason.

    Secondly, I want to extend the analysis to allow for variable post and pre-treatment effects by year as opposed to a single post-treatment effect, i.e. an estimate of earlyconverters#(4 years before conversion), earlyconverters#(3 years before conversion) all the way to 3 years after conversion.

    The estimation I attempted was

    reg y i.earlyconverters#i.treat_year i.year controls, robust

    where treat_year takes the value of 0 for 4 years before conversion all the way to 7 which is 3 years after conversion.

    Interpreting the output made me realize that all the co-efficient values and significance was c.f. the default category which in this case is (0,0) that is a school that converts to an academy post-2010 and is 4 years away from that conversion. I don't think that this is intuitive and what I want my regression to show is the effect of conversion at each yearly interval - Difference in outcomes for an academy 4 years prior to conversion as compared to a non-academy 4 years prior to conversion, difference in outcomes for an academy 1 year after the conversion as compared to a non-academy 1 year after the conversion for all 8 time periods - i.e. relative default categories for each time period.

    I am struggling to come up with a regression that would get me this result and I would be very grateful if I could be pointed in the right direction,

    Thank you,
    Yash


  • #2
    To estimate the D-i-D, I use the following model -

    reg y i.earlyconverters#i.afterconversion i.year controls, robust

    Doing so, the Stata output shows a significant term on 1.earlyconverters#1.afterconversion with the base as (0,0).
    But running the regression as

    reg y i.earlyconverters##i.afterconversion i.year controls, robust

    gives me the same F-statistic, the same R-squared, identical co-efficients and t-statistics on all covariates apart from 1.earlyconverters#1.afterconversion which is now massively insignificant (with a p-value of 0.981).
    This is confusing, because the two regressions you show here are absolutely identical.

    From the sentence that followed, I infer that you somehow changed some basecategory(ies), but you don't show that or explain what you did. In any case, a change in the base category would produce exactly the kind of results you describe: coefficients of the affected terms change, but nothing else does. If you ran -predict- after each model, you would find that they give exactly the same predicted values in every observation.That's because it's the same model, just with a different parameterization. In particular, when you look at an interaction coefficient, you are just changing which differences are being highlighted in the regression output. To get a parameterization-invariant view of what your model is telling you, should look at the output of

    Code:
    margins earlyconverters#afterconversion
    margins earlyconverters, dydx(afterconversion)
    The first of these will give you the expected values of y in each of the four combinations of earlyconverters and afterconversion. The second will show you the change in y (marginal effect) from before conversion to after in each group. Changing base categories won't alter any of these. And, as a bonus, they are directly and easily understandable, as opposed to the regression coefficients which have to be understood as either conditional or reflecting differences between subsets.

    As for your second concern, following that regression have a look at:

    Code:
    margins treat_year, dydx(earlyconverters)
    I take it you are not familiar with the -margins- command. It is one of the best things in all of Stata, and it is really an indispensible tool for understanding DID analyses.* The corresponding section of the manual is quite good, but it is inherently complicated. I recommend you read the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf for a quick, easy introduction to it.

    Added: *or any other models with interaction terms.
    Last edited by Clyde Schechter; 06 Dec 2017, 11:32.

    Comment


    • #3
      Thank you so much for the quick reply Clyde. I had a few follow-up questions if you don't mind.
      Firstly, as you said, both the regressions are absolutely identical, which is why I cannot understand why the estimated co-efficients are so different. I have not done anything manually to alter the base categories but the reason I assumed that was the case was because nothing else in the output changed. Running the command

      margins earlyconverters, dydx(afterconversion)

      gives me the same marginal effect of *0.earlyconverters#1.afterconversion as both the specifications but the value for 1.earlyconverters#1.afterconversion is now 0.02218 (p-value=0.465) as compared to 0.092 (p-value=0.001) in the first specification and -0.297 (p-value = 0.376) in the second specification.

      *Running the same command but switching the objective functions -
      margins afterconversion, dydx(earlyconverters)

      gave me the same value for 1.earlyconverters#0.afterconversion but yet another value fo 1.earlyconverters#1.afterconversion



      Regarding the second concern, I ran the margins command as you suggested and my output was 'not estimable' for all values of treat_year.

      *Edits
      Last edited by Yash Chaudhary; 06 Dec 2017, 12:04.

      Comment


      • #4
        If you are running the exact same commands on the exact same data and getting different results then something is seriously wrong, and none of the outputs can be trusted. If you did not change the base categories, and you did not change the data, nor the commands, you should get the same results. I suggest you post here the exact commands you ran (don't skip anything in between) and the exact full output you are getting.

        It is premature to look into your second question if the basic model is not yet nailed down.

        Comment


        • #5
          *Column 1 - OLS with only yearly controls*
          reg schstdks4_cappedpts i.year i.earlyconverters#i.afterconversion, robust

          Linear regression Number of obs = 1,264
          F(14, 1249) = 20.44
          Prob > F = 0.0000
          R-squared = 0.1500
          Root MSE = .36319

          -------------------------------------------------------------------------------------------------
          | Robust
          schstdks4_cappedpts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          --------------------------------+----------------------------------------------------------------
          year |
          2003 | -.2799424 .2311619 -1.21 0.226 -.7334508 .173566
          2004 | -.2039077 .214009 -0.95 0.341 -.6237646 .2159492
          2005 | -.200673 .208557 -0.96 0.336 -.6098338 .2084878
          2006 | -.1890817 .2066744 -0.91 0.360 -.594549 .2163857
          2007 | -.1416356 .2068891 -0.68 0.494 -.5475241 .264253
          2008 | -.0700349 .2070069 -0.34 0.735 -.4761545 .3360846
          2009 | -.017034 .2090639 -0.08 0.935 -.4271891 .3931212
          2010 | .0354931 .2118959 0.17 0.867 -.3802181 .4512043
          2011 | .1322136 .211697 0.62 0.532 -.2831075 .5475346
          2012 | .1377421 .2122765 0.65 0.517 -.2787157 .5541999
          2013 | .13946 .2152425 0.65 0.517 -.2828169 .5617368
          |
          earlyconverters#afterconversion |
          0 1 | .0639332 .0505085 1.27 0.206 -.0351578 .1630241
          1 0 | .1236718 .0343608 3.60 0.000 .0562605 .191083
          1 1 | .1886763 .040051 4.71 0.000 .1101016 .2672511
          |
          _cons | -.3158101 .2064661 -1.53 0.126 -.7208688 .0892485
          -------------------------------------------------------------------------------------------------

          .
          . *Column 4 - OLS with other control variables*
          . reg schstdks4_cappedpts i.year i.earlyconverters#i.afterconversion schks2_eng_exp schks2_eng_abv schks2_mat_exp schks2_mat_abv schks2_sci_exp schks2_sci_abv schfemale schfsm schsen schwhite schblack schasian, robust

          Linear regression Number of obs = 1,264
          F(26, 1237) = 91.72
          Prob > F = 0.0000
          R-squared = 0.6161
          Root MSE = .24526

          -------------------------------------------------------------------------------------------------
          | Robust
          schstdks4_cappedpts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          --------------------------------+----------------------------------------------------------------
          year |
          2003 | -.103877 .1275629 -0.81 0.416 -.3541406 .1463866
          2004 | -.1448208 .1180761 -1.23 0.220 -.3764724 .0868307
          2005 | -.2384776 .1152587 -2.07 0.039 -.4646018 -.0123534
          2006 | -.2014004 .1183321 -1.70 0.089 -.4335543 .0307534
          2007 | -.1801432 .1166253 -1.54 0.123 -.4089485 .048662
          2008 | -.1050502 .1172407 -0.90 0.370 -.3350628 .1249623
          2009 | -.0631218 .1184605 -0.53 0.594 -.2955276 .169284
          2010 | -.0228516 .1215843 -0.19 0.851 -.2613859 .2156827
          2011 | .0204458 .1203059 0.17 0.865 -.2155804 .256472
          2012 | .000116 .1219015 0.00 0.999 -.2390405 .2392726
          2013 | .0026299 .1255981 0.02 0.983 -.2437789 .2490387
          |
          earlyconverters#afterconversion |
          0 1 | .051855 .0368492 1.41 0.160 -.0204388 .1241489
          1 0 | .0701312 .0216144 3.24 0.001 .0277263 .1125362
          1 1 | .0923084 .0266726 3.46 0.001 .0399799 .1446369
          |
          schks2_eng_exp | .8312307 .1904158 4.37 0.000 .457657 1.204804
          schks2_eng_abv | 1.481489 .2073127 7.15 0.000 1.074766 1.888213
          schks2_mat_exp | .6847628 .1934076 3.54 0.000 .3053197 1.064206
          schks2_mat_abv | .8014206 .2349212 3.41 0.001 .3405326 1.262309
          schks2_sci_exp | -.2579983 .2804386 -0.92 0.358 -.8081862 .2921895
          schks2_sci_abv | -.11619 .2360023 -0.49 0.623 -.5791991 .3468191
          schfemale | .1705866 .0608823 2.80 0.005 .0511426 .2900306
          schfsm | -.2299195 .0758098 -3.03 0.002 -.3786494 -.0811896
          schsen | -.1429862 .063454 -2.25 0.024 -.2674755 -.0184968
          schwhite | -.0256358 .1326983 -0.19 0.847 -.2859745 .2347029
          schblack | .3329815 .1777665 1.87 0.061 -.0157757 .6817386
          schasian | .1717201 .1668003 1.03 0.303 -.1555226 .4989629
          _cons | -1.163177 .3101663 -3.75 0.000 -1.771687 -.5546668
          -------------------------------------------------------------------------------------------------

          .
          end of do-file

          . reg schstdks4_cappedpts i.year i.earlyconverters##i.afterconversion schks2_eng_exp schks2_eng_abv schks2_mat_exp schks2_mat_abv schks2_sci_exp schks2_sci_abv schfemale schfsm schsen schwhite schblack schasian, robust

          Linear regression Number of obs = 1,264
          F(26, 1237) = 91.72
          Prob > F = 0.0000
          R-squared = 0.6161
          Root MSE = .24526

          -------------------------------------------------------------------------------------------------
          | Robust
          schstdks4_cappedpts | Coef. Std. Err. t P>|t| [95% Conf. Interval]
          --------------------------------+----------------------------------------------------------------
          year |
          2003 | -.103877 .1275629 -0.81 0.416 -.3541406 .1463866
          2004 | -.1448208 .1180761 -1.23 0.220 -.3764724 .0868307
          2005 | -.2384776 .1152587 -2.07 0.039 -.4646018 -.0123534
          2006 | -.2014004 .1183321 -1.70 0.089 -.4335543 .0307534
          2007 | -.1801432 .1166253 -1.54 0.123 -.4089485 .048662
          2008 | -.1050502 .1172407 -0.90 0.370 -.3350628 .1249623
          2009 | -.0631218 .1184605 -0.53 0.594 -.2955276 .169284
          2010 | -.0228516 .1215843 -0.19 0.851 -.2613859 .2156827
          2011 | .0204458 .1203059 0.17 0.865 -.2155804 .256472
          2012 | .000116 .1219015 0.00 0.999 -.2390405 .2392726
          2013 | .0026299 .1255981 0.02 0.983 -.2437789 .2490387
          |
          1.earlyconverters | .0701312 .0216144 3.24 0.001 .0277263 .1125362
          1.afterconversion | .051855 .0368492 1.41 0.160 -.0204388 .1241489
          |
          earlyconverters#afterconversion |
          1 1 | -.0296779 .0334932 -0.89 0.376 -.0953876 .0360319
          |
          schks2_eng_exp | .8312307 .1904158 4.37 0.000 .457657 1.204804
          schks2_eng_abv | 1.481489 .2073127 7.15 0.000 1.074766 1.888213
          schks2_mat_exp | .6847628 .1934076 3.54 0.000 .3053197 1.064206
          schks2_mat_abv | .8014206 .2349212 3.41 0.001 .3405326 1.262309
          schks2_sci_exp | -.2579983 .2804386 -0.92 0.358 -.8081862 .2921895
          schks2_sci_abv | -.11619 .2360023 -0.49 0.623 -.5791991 .3468191
          schfemale | .1705866 .0608823 2.80 0.005 .0511426 .2900306
          schfsm | -.2299195 .0758098 -3.03 0.002 -.3786494 -.0811896
          schsen | -.1429862 .063454 -2.25 0.024 -.2674755 -.0184968
          schwhite | -.0256358 .1326983 -0.19 0.847 -.2859745 .2347029
          schblack | .3329815 .1777665 1.87 0.061 -.0157757 .6817386
          schasian | .1717201 .1668003 1.03 0.303 -.1555226 .4989629
          _cons | -1.163177 .3101663 -3.75 0.000 -1.771687 -.5546668
          -------------------------------------------------------------------------------------------------

          . margins earlyconverters, dydx(afterconversion)

          Average marginal effects Number of obs = 1,264
          Model VCE : Robust

          Expression : Linear prediction, predict()
          dy/dx w.r.t. : 1.afterconversion

          ------------------------------------------------------------------------------------
          | Delta-method
          | dy/dx Std. Err. t P>|t| [95% Conf. Interval]
          -------------------+----------------------------------------------------------------
          1.afterconversion |
          earlyconverters |
          0 | .051855 .0368492 1.41 0.160 -.0204388 .1241489
          1 | .0221772 .0303451 0.73 0.465 -.0373563 .0817107
          ------------------------------------------------------------------------------------
          Note: dy/dx for factor levels is the discrete change from the base level.

          . margins afterconversion, dydx(earlyconverters)

          Average marginal effects Number of obs = 1,264
          Model VCE : Robust

          Expression : Linear prediction, predict()
          dy/dx w.r.t. : 1.earlyconverters

          ------------------------------------------------------------------------------------
          | Delta-method
          | dy/dx Std. Err. t P>|t| [95% Conf. Interval]
          -------------------+----------------------------------------------------------------
          1.earlyconverters |
          afterconversion |
          0 | .0701312 .0216144 3.24 0.001 .0277263 .1125362
          1 | .0404534 .0260016 1.56 0.120 -.0105587 .0914655
          ------------------------------------------------------------------------------------
          Note: dy/dx for factor levels is the discrete change from the base level.

          Comment


          • #6
            These are not not not not not the same. On one occasion you use i.earlyconverters#i.afterconversion, and in the other you use i.earlyconverters##i.afterconversion. The # and ## are not the same thing. And in most situations (including yours) only the analysis with ## is valid.

            Read -help fvvarlist- to understand the difference between # and ##.

            Comment


            • #7
              In my initial post I did mention that I used i.earlyconverters#i.afterconversion in the first regression and i.earlyconverters##i.afterconversion in the second and I got an identical R-squared, F-statistic and estimates for other covariates with only the estimate for 1.earlyconverters#1.afterconversion changing. My initial question was set out to ascertain the difference between the 2 specifications and try and understand why the results were so different.

              Also, as you said that the margins command shouldn't depend on the parameterization, why is it that margins earlyconverters, dydx (afterconversion) and margins afterconversion, dydx(earlyconverters) give such different results?

              Thank you

              Comment


              • #8
                I see you did say # and ## in the first post, I missed it. I'm sorry. For some reason it didn't catch my eye as it should have. Only in #3 did I perceive it.

                These are not simply reparameterizations of each other. They are different model specifications altogether (and the one using only # is incorrect.) They are, in fact, two models in which one includes some variables not present in the other. There is no reason the results should be the same. The model with ## includes i.earlyconverters and i.afterconversion on their own in addition to their interaction term. The model with # contains only the interaction without the "main" effects. (I prefer to call them constituent effects, because the term "main effect" leads people to misinterpret what they actually mean.) Do read -help fvvarlist- for more information about this.

                Also, as you said that the margins command shouldn't depend on the parameterization, why is it that margins earlyconverters, dydx (afterconversion) and margins afterconversion, dydx(earlyconverters) give such different results?
                These are not different parameterizations of each other, either. These are different questions, so they have different answers. The first asks: in each category of early converters, how much do things change after conversion vs before. The second asks: before (and then, too, after) conversion, how much difference is there between earlyconverters and controls.

                Comment


                • #9
                  That makes a lot more sense now. Thank you so much for the clarification.
                  With regards to the second concern about yearly effects, I know you have recommended using margins but if I were to use a variant of this difference-in-difference specification, what would you suggest the code to be. Extending your suggestion to replace afterconversion with treat_year I get:
                  Code:
                  reg schstdks4_cappedpts i.year i.earlyconverters##i.treat_year schks2_eng_exp schks2_eng_abv schks2_mat_exp schks2_mat_abv schks2_sci_exp schks2_sci_abv schfemale schfsm schsen schwhite schblack schasian, robust baselevels
                  note: 1.earlyconverters#7.treat_year omitted because of collinearity
                  
                  Linear regression                               Number of obs     =      1,264
                                                                  F(37, 1226)       =      66.80
                                                                  Prob > F          =     0.0000
                                                                  R-squared         =     0.6212
                                                                  Root MSE          =     .24471
                  
                  --------------------------------------------------------------------------------------------
                                             |               Robust
                         schstdks4_cappedpts |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                  ---------------------------+----------------------------------------------------------------
                                        year |
                                       2002  |          0  (base)
                                       2003  |  -.1024338   .1266326    -0.81   0.419    -.3508745    .1460069
                                       2004  |  -.1475457    .120105    -1.23   0.220    -.3831798    .0880884
                                       2005  |  -.2397768    .117702    -2.04   0.042    -.4706965   -.0088572
                                       2006  |  -.2032563   .1270421    -1.60   0.110    -.4525002    .0459877
                                       2007  |  -.1839323   .1267776    -1.45   0.147    -.4326574    .0647929
                                       2008  |  -.1312449     .12989    -1.01   0.312    -.3860763    .1235865
                                       2009  |  -.1303111   .1318537    -0.99   0.323    -.3889951    .1283728
                                       2010  |  -.1185822   .1347214    -0.88   0.379    -.3828922    .1457278
                                       2011  |  -.1446667   .1368345    -1.06   0.291    -.4131223     .123789
                                       2012  |  -.1982916   .1411805    -1.40   0.160    -.4752738    .0786905
                                       2013  |  -.1482692   .1624127    -0.91   0.361    -.4669068    .1703684
                                             |
                             earlyconverters |
                                          0  |          0  (base)
                                          1  |   .0643754   .0519852     1.24   0.216    -.0376145    .1663653
                                             |
                                  treat_year |
                                          0  |          0  (base)
                                          1  |  -.0128218    .052473    -0.24   0.807    -.1157686    .0901249
                                          2  |   .0284751   .0601388     0.47   0.636    -.0895112    .1464613
                                          3  |    .080384   .0667417     1.20   0.229    -.0505566    .2113245
                                          4  |   .1217063   .0741043     1.64   0.101     -.023679    .2670916
                                          5  |   .2249027   .0753269     2.99   0.003     .0771188    .3726866
                                          6  |   .2663198   .0857917     3.10   0.002     .0980049    .4346346
                                          7  |   .2025586   .0760964     2.66   0.008     .0532649    .3518522
                                             |
                  earlyconverters#treat_year |
                                        1 1  |   .0216966    .071522     0.30   0.762    -.1186224    .1620156
                                        1 2  |  -.0051781   .0718663    -0.07   0.943    -.1461727    .1358166
                                        1 3  |  -.0729474   .0764229    -0.95   0.340    -.2228815    .0769867
                                        1 4  |   -.098835   .0793353    -1.25   0.213    -.2544829     .056813
                                        1 5  |  -.1309482   .0759802    -1.72   0.085    -.2800138    .0181175
                                        1 6  |   -.092823   .0819557    -1.13   0.258     -.253612     .067966
                                        1 7  |          0  (omitted)
                                             |
                              schks2_eng_exp |   .8415962    .190485     4.42   0.000     .4678835    1.215309
                              schks2_eng_abv |   1.448585   .2060547     7.03   0.000     1.044326    1.852844
                              schks2_mat_exp |   .7042083   .1957173     3.60   0.000     .3202304    1.088186
                              schks2_mat_abv |   .8362858    .239637     3.49   0.001     .3661418     1.30643
                              schks2_sci_exp |  -.3029916   .2824497    -1.07   0.284    -.8571299    .2511468
                              schks2_sci_abv |  -.1603757   .2408857    -0.67   0.506    -.6329696    .3122181
                                   schfemale |   .1685228     .06051     2.79   0.005     .0498082    .2872374
                                      schfsm |  -.2545218   .0767447    -3.32   0.001    -.4050872   -.1039564
                                      schsen |  -.1492971    .062841    -2.38   0.018    -.2725849   -.0260094
                                    schwhite |  -.0063786   .1322071    -0.05   0.962    -.2657558    .2529985
                                    schblack |   .3288026   .1771433     1.86   0.064    -.0187349    .6763401
                                    schasian |   .1924931   .1663859     1.16   0.248    -.1339396    .5189257
                                       _cons |  -1.149123   .3028266    -3.79   0.000    -1.743239    -.555007
                  --------------------------------------------------------------------------------------------
                  How do I interpret the co-efficients on 1.earlyconverters#i.treat_year and why is (1,7) omitted?
                  Thank you

                  .

                  Comment


                  • #10
                    The reason 1.earlyconverters#7.treat_year is omitted is because there is a colinearity among earlyconverters, i.treat_year, earlyconverters#treat_year and i.year. So you can't have them all. Because of the order in which you listed them in the regress command, Stata chose to omit 1.earlyconverters#7.treat_year, as it came last. If you want to keep that one in, you have to get rid of something else. You could either explicitly omit one of the year indicators, by specifying, say, i(2002/2012).year (explicitly omitting 2013.year), or you could just put i.year after i.earlyconverters##treat_year in the -regress- command, and Stata will eliminate a year indicator (probably 2013.)

                    In principle, there is no need to use -margins-; you can get those differences as linear combinations of the coefficients using -lincom-. But it's tedious and error prone, so I really recommend against it in practice. The reason you are getting "not estimable" results is probably that you have some combination of year, earlyconverters, and treat_year for which there are no observations, or at least none that don't get omitted due to missing values of some other variable. You can force -margins- to give you an answer by adding the -noestimcheck- option to the command. I think that is better than figuring out the -lincom-s.

                    Comment


                    • #11
                      Thank you so much for your guidance Clyde. I understand where I was going wrong and can now proceed with my work having corrected those mistakes. I am extremely grateful for your help.

                      Comment

                      Working...
                      X