Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Margins and diff-in-diff estimates

    Hi there,

    I've been trying to use margins to estimate the pre- and post-treatment trends in a difference-in-differences model on a state-level intervention using data from several years and states. I have run into a bit of a quirk with margins, and am curious as to whether anyone has an insight into it.

    Given the following regression,
    Code:
    regress outcome did treat time
    where treat is an indicator for treatment status, and time is an indicator for pre/post treatment period, and did is an interaction term between the two, I have used margins like this:
    Code:
    margins, over(treat time)
    margins, over(treat time) pwcompare
    margins, over(r.treat r.time)
    The third line gives a difference-in-differences estimate that is close to, but not the same as the coefficient in the original regression—which it should be. I don't trust the results from the other two lines because of this, either. This issue is solved when I code it instead
    Code:
    margins treat, over(time)
    margins treat, over(time) pwcompare
    margins r.treat, over(r.time)
    This wouldn't be an issue at all, except in my preferred regression I use state and quarter-year fixed effects and omit the main effects treat and time from the regression, so using the latter coding scheme for margins doesn't work.

    Does anyone know why this happens, and if there's any way around it? I'd love to be able to present regression-adjusted pre- and post-treatment means using my fixed effects model. Thanks!

  • #2
    That code cannot work. Discard whatever you got from it: pure garbage.

    To use -margins- with an interaction model you must run the regression with factor variable notation. In the code you wrote, -margins- does not know that did is the interaction between treat and time, so it handles these three variables as if they were unrelated to each other. Also, while the use of the -over()- option is allowable, it has a specific purpose and it is not what is usually wanted in this context. Try this:

    Code:
    regress outcome i.treat##i.time
    margins treat#time
    margins treat, dydx(time)
    margins time, dydx(treat)
    The -margins- command is one of modern Stata's best features, IMO. Do familiarize yourself with it by reading the excellent Richard Williams' https://www3.nd.edu/~rwilliam/stats/Margins01.pdf. It's the best introduction to -margins- out there and it contains lots of worked examples, including some directly applicable to your situation. After that, when you have time, learn about the more advanced features of -margins- from the chapter in the PDF documentation.

    Comment


    • #3
      Great, thank you for the recommendation!

      A quick related question—my preferred regression includes state and quarter-year fixed effects that cause the main effects time and treat to be omitted due to collinearity when using the factor variable notation.

      But even when I use the single -#- notation along with the two fixed effects, either the 1.time#1.treat or a state-quarter interaction is dropped because of collinearity. Thus I opted for the manually-created interaction, rather than factor notation. Does my inclusion of fixed effects eliminate my ability to use -margins-, then?

      On a search of the two documents you suggested, I couldn't find any discussion of using fixed effects, so I thought I'd check with you. Thanks again!

      Comment


      • #4
        my preferred regression includes state and quarter-year fixed effects that cause the main effects time and treat to be omitted due to collinearity when using the factor variable notation.

        But even when I use the single -#- notation along with the two fixed effects, either the 1.time#1.treat or a state-quarter interaction is dropped because of collinearity.
        That is exactly what is supposed to happen, and it is not a problem. The treat variable is colinear with the state effects and the time variable is colinear with the quarter effects. Consequently treat and time drop. The interaction term can survive, but it is now colinear with one of the quarter effects, so either it or one of those must drop. Remember that when you try to include colinear terms in the model, the model is unidentified, so no results can be obtained (or, rather, an infinite number of results can be obtained and no basis for choosing among them) until the colinearity is broken. There are many ways to do that (an infinite number, actually) but the simplest and commonest is to remove one of the colinear variables. Since you are interested in the time#treat interaction term, it makes sense to just let one of the quarter indicators go: you're not interested in those, they're just in there to adjust for shocks, they're nuisance variables.

        In a fixed effects model like this, with the colinearities, you may run into estimability problems with -margins-. But it is reasonable in this situation to deal with that by using the -noestimcheck- option with your -margins- command.

        Comment


        • #5
          Excellent—thank you, Clyde! That clears it up beautifully.

          Comment


          • #6
            Hi Clyde, an update—does it matter if the indicator dropped was actually a state indicator, not a quarter one? I found a Statalist post that seems to suggest it does. Relatedly, is there any way to specify that Stata should drop an indicator in lieu of the desired interaction term, or will I need to manually create the set of dummies?

            Thanks again!

            Comment


            • #7
              I'm not sure what you're saying here. Why don't you post the outputs from the different models so we can discuss this in concrete terms.

              Comment


              • #8
                Thanks—here's what I'm getting, where "hbi" is the treatment indicator and "active" the pre/post indicator:
                Code:
                . regress outcome hbi#active i._state i.yearquarter married i._race_g1 i.educa age female i.income2 employ1 cell, cluster(_state)
                note: 1.hbi#1.active omitted because of collinearity
                
                Linear regression                                      Number of obs =   65917
                                                                       F(  9,    10) =       .
                                                                       Prob > F      =       .
                                                                       R-squared     =  0.0666
                                                                       Root MSE      =  .47205
                
                                                        (Std. Err. adjusted for 11 clusters in _state)
                --------------------------------------------------------------------------------------
                                     |               Robust
                             outcome |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                ---------------------+----------------------------------------------------------------
                          hbi#active |
                                0 1  |   .0282346   .0095162     2.97   0.014     .0070312    .0494379
                                1 0  |   -.056729   .0120131    -4.72   0.001     -.083496   -.0299621
                                1 1  |          0  (omitted)
                                     |
                              _state |
                                 IN  |   .0245113   .0124889     1.96   0.078    -.0033157    .0523382
                                 IA  |   .0837916   .0093889     8.92   0.000     .0628718    .1047114
                                 KS  |   .0044528   .0040407     1.10   0.296    -.0045503     .013456
                                 MI  |   .0678174   .0108236     6.27   0.000     .0437009    .0919339
                                 MO  |  -.0134804   .0052468    -2.57   0.028    -.0251709   -.0017899
                                 NE  |  -.0538996   .0046775   -11.52   0.000    -.0643217   -.0434775
                                 ND  |  -.0359445   .0041456    -8.67   0.000    -.0451815   -.0267075
                                 OH  |    .046565   .0030205    15.42   0.000     .0398349     .053295
                                 PA  |   .0617331   .0031023    19.90   0.000     .0548207    .0686455
                                 SD  |    .043543   .0061818     7.04   0.000      .029769    .0573169
                                     |
                         yearquarter |
                            2011 Q2  |  -.0021998   .0118787    -0.19   0.857    -.0286673    .0242676
                            2011 Q3  |  -.0192766   .0112152    -1.72   0.116    -.0442656    .0057123
                            2011 Q4  |   .0022903   .0093385     0.25   0.811    -.0185171    .0230977
                            2012 Q1  |   .0039056   .0096874     0.40   0.695    -.0176794    .0254905
                            2012 Q2  |   .0094244   .0122209     0.77   0.458    -.0178055    .0366543
                            2012 Q3  |   .0067118   .0106521     0.63   0.543    -.0170225    .0304461
                            2012 Q4  |  -.0008587   .0128453    -0.07   0.948    -.0294798    .0277624
                            2013 Q1  |    .026498   .0166775     1.59   0.143    -.0106619    .0636578
                            2013 Q2  |   .0243838   .0156107     1.56   0.149    -.0103989    .0591665
                            2013 Q3  |    .017049    .013788     1.24   0.245    -.0136726    .0477706
                            2013 Q4  |   .0200471   .0114105     1.76   0.109    -.0053772    .0454713
                            2014 Q1  |   .0049145   .0125719     0.39   0.704    -.0230974    .0329264
                            2014 Q2  |    .018825   .0106244     1.77   0.107    -.0048477    .0424977
                            2014 Q3  |    .040089   .0112053     3.58   0.005      .015122    .0650559
                            2014 Q4  |   .0450431   .0178402     2.52   0.030     .0052928    .0847935
                            2015 Q1  |   .0317327   .0103781     3.06   0.012     .0086088    .0548567
                            2015 Q2  |   .0373208   .0113494     3.29   0.008     .0120327    .0626089
                            2015 Q3  |   .0346998   .0112484     3.08   0.012     .0096369    .0597627
                            2015 Q4  |   .0527533   .0104404     5.05   0.000     .0294906    .0760159
                            2016 Q1  |   .0590476   .0144316     4.09   0.002      .026892    .0912032
                            2016 Q2  |   .0584476   .0174996     3.34   0.007      .019456    .0974391
                            2016 Q3  |   .0480533   .0204053     2.35   0.040     .0025876    .0935191
                            2016 Q4  |   .0624481   .0144331     4.33   0.001     .0302893     .094607
                                     |
                             married |  -.0053387   .0048409    -1.10   0.296     -.016125    .0054476
                                     |
                            _race_g1 |
                Black, non-Hispanic  |   .1337631   .0085096    15.72   0.000     .1148026    .1527236
                           Hispanic  |   .0149281   .0110136     1.36   0.205    -.0096118    .0394681
                              Other  |    .036073   .0104487     3.45   0.006      .012792    .0593541
                                     |
                               educa |
                             HS/GED  |   .0159497   .0076921     2.07   0.065    -.0011893    .0330888
                       Some college  |   .0186126   .0070713     2.63   0.025     .0028568    .0343684
                   College graduate  |   .0069953    .009135     0.77   0.462    -.0133588    .0273495
                                     |
                                 age |   .0043844   .0002259    19.41   0.000     .0038812    .0048877
                              female |    .088177   .0048183    18.30   0.000     .0774411     .098913
                                     |
                             income2 |
                                  2  |   .0084574   .0055734     1.52   0.160     -.003961    .0208758
                                  3  |  -.0017573   .0073165    -0.24   0.815    -.0180595    .0145449
                                  4  |   .0032921   .0054634     0.60   0.560     -.008881    .0154653
                                  5  |   .0060964   .0097492     0.63   0.546    -.0156263    .0278191
                                  6  |  -.0012045   .0276123    -0.04   0.966    -.0627286    .0603196
                                  7  |  -.1514255   .0863456    -1.75   0.110    -.3438154    .0409643
                                     |
                             employ1 |  -.1071173   .0054861   -19.53   0.000    -.1193411   -.0948936
                                cell |  -.0296031    .003029    -9.77   0.000    -.0363522    -.022854
                               _cons |    .349979   .0111188    31.48   0.000     .3252047    .3747533
                --------------------------------------------------------------------------------------
                To complicate things more, my final model compares two different treatment groups to one control group using two interaction terms, but now one of the interactions is collinear with the other (tradexp is the other treatment indicator), as below:
                Code:
                . regress outcome hbi#active tradexp#active i._state i.yearquarter married i._race_g1 i.educa age female i.income2 employ1 cell, clust
                > er(_state)
                note: 1.hbi#1.active omitted because of collinearity
                note: 0b.tradexp#1.active omitted because of collinearity
                note: 1.tradexp#0b.active omitted because of collinearity
                note: 1.tradexp#1.active omitted because of collinearity
                
                Linear regression                                      Number of obs =   65917
                                                                       F(  9,    10) =       .
                                                                       Prob > F      =       .
                                                                       R-squared     =  0.0666
                                                                       Root MSE      =  .47205
                
                                                        (Std. Err. adjusted for 11 clusters in _state)
                --------------------------------------------------------------------------------------
                                     |               Robust
                             outcome |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                ---------------------+----------------------------------------------------------------
                          hbi#active |
                                0 1  |   .0282346   .0095162     2.97   0.014     .0070312    .0494379
                                1 0  |   -.056729   .0120131    -4.72   0.001     -.083496   -.0299621
                                1 1  |          0  (omitted)
                                     |
                      tradexp#active |
                                0 1  |          0  (omitted)
                                1 0  |          0  (omitted)
                                1 1  |          0  (omitted)
                                     |
                              _state |
                                 IN  |   .0245113   .0124889     1.96   0.078    -.0033157    .0523382
                                 IA  |   .0837916   .0093889     8.92   0.000     .0628718    .1047114
                                 KS  |   .0044528   .0040407     1.10   0.296    -.0045503     .013456
                                 MI  |   .0678174   .0108236     6.27   0.000     .0437009    .0919339
                                 MO  |  -.0134804   .0052468    -2.57   0.028    -.0251709   -.0017899
                                 NE  |  -.0538996   .0046775   -11.52   0.000    -.0643217   -.0434775
                                 ND  |  -.0359445   .0041456    -8.67   0.000    -.0451815   -.0267075
                                 OH  |    .046565   .0030205    15.42   0.000     .0398349     .053295
                                 PA  |   .0617331   .0031023    19.90   0.000     .0548207    .0686455
                                 SD  |    .043543   .0061818     7.04   0.000      .029769    .0573169
                                     |
                         yearquarter |
                            2011 Q2  |  -.0021998   .0118787    -0.19   0.857    -.0286673    .0242676
                            2011 Q3  |  -.0192766   .0112152    -1.72   0.116    -.0442656    .0057123
                            2011 Q4  |   .0022903   .0093385     0.25   0.811    -.0185171    .0230977
                            2012 Q1  |   .0039056   .0096874     0.40   0.695    -.0176794    .0254905
                            2012 Q2  |   .0094244   .0122209     0.77   0.458    -.0178055    .0366543
                            2012 Q3  |   .0067118   .0106521     0.63   0.543    -.0170225    .0304461
                            2012 Q4  |  -.0008587   .0128453    -0.07   0.948    -.0294798    .0277624
                            2013 Q1  |    .026498   .0166775     1.59   0.143    -.0106619    .0636578
                            2013 Q2  |   .0243838   .0156107     1.56   0.149    -.0103989    .0591665
                            2013 Q3  |    .017049    .013788     1.24   0.245    -.0136726    .0477706
                            2013 Q4  |   .0200471   .0114105     1.76   0.109    -.0053772    .0454713
                            2014 Q1  |   .0049145   .0125719     0.39   0.704    -.0230974    .0329264
                            2014 Q2  |    .018825   .0106244     1.77   0.107    -.0048477    .0424977
                            2014 Q3  |    .040089   .0112053     3.58   0.005      .015122    .0650559
                            2014 Q4  |   .0450431   .0178402     2.52   0.030     .0052928    .0847935
                            2015 Q1  |   .0317327   .0103781     3.06   0.012     .0086088    .0548567
                            2015 Q2  |   .0373208   .0113494     3.29   0.008     .0120327    .0626089
                            2015 Q3  |   .0346998   .0112484     3.08   0.012     .0096369    .0597627
                            2015 Q4  |   .0527533   .0104404     5.05   0.000     .0294906    .0760159
                            2016 Q1  |   .0590476   .0144316     4.09   0.002      .026892    .0912032
                            2016 Q2  |   .0584476   .0174996     3.34   0.007      .019456    .0974391
                            2016 Q3  |   .0480533   .0204053     2.35   0.040     .0025876    .0935191
                            2016 Q4  |   .0624481   .0144331     4.33   0.001     .0302893     .094607
                                     |
                             married |  -.0053387   .0048409    -1.10   0.296     -.016125    .0054476
                                     |
                            _race_g1 |
                Black, non-Hispanic  |   .1337631   .0085096    15.72   0.000     .1148026    .1527236
                           Hispanic  |   .0149281   .0110136     1.36   0.205    -.0096118    .0394681
                              Other  |    .036073   .0104487     3.45   0.006      .012792    .0593541
                                     |
                               educa |
                             HS/GED  |   .0159497   .0076921     2.07   0.065    -.0011893    .0330888
                       Some college  |   .0186126   .0070713     2.63   0.025     .0028568    .0343684
                   College graduate  |   .0069953    .009135     0.77   0.462    -.0133588    .0273495
                                     |
                                 age |   .0043844   .0002259    19.41   0.000     .0038812    .0048877
                              female |    .088177   .0048183    18.30   0.000     .0774411     .098913
                                     |
                             income2 |
                                  2  |   .0084574   .0055734     1.52   0.160     -.003961    .0208758
                                  3  |  -.0017573   .0073165    -0.24   0.815    -.0180595    .0145449
                                  4  |   .0032921   .0054634     0.60   0.560     -.008881    .0154653
                                  5  |   .0060964   .0097492     0.63   0.546    -.0156263    .0278191
                                  6  |  -.0012045   .0276123    -0.04   0.966    -.0627286    .0603196
                                  7  |  -.1514255   .0863456    -1.75   0.110    -.3438154    .0409643
                                     |
                             employ1 |  -.1071173   .0054861   -19.53   0.000    -.1193411   -.0948936
                                cell |  -.0296031    .003029    -9.77   0.000    -.0363522    -.022854
                               _cons |    .349979   .0111188    31.48   0.000     .3252047    .3747533
                --------------------------------------------------------------------------------------

                Comment


                • #9
                  One piece of potentially relevant information—for the control group, I have active coded as 0 for every time period (on some advice I was given for when I created the manual interaction terms). Not sure if that's an issue, but from what I understand that's non-conventional.

                  Comment


                  • #10
                    OK. Something is wrong here. You should not be getting 1.hbi#1.active omitted because of colinearity. I cannot tell just what the cause of this is. It may be that some coincidental pattern of missing values for other variables in the regression is causing the combination of hbi = 1 and active = 1 to occur only under some conditions that can be perfectly linearly predicted. Or it may be that your hbi or active variable is coded incorrectly. So you're going to have to look into that. The quickest way to figure it out is

                    Code:
                    gen interaction = 1.hbi#1.active
                    
                    regress interaction 0.hbi#1.active 1.hbi#0.active i._state i.yearquarter married ///
                        _race_q1 educa age female i.income2 employ1 cell if !missing(outcome)
                    You will get a regression with R2 = 1 and the coefficients will show you exactly what the linear relationship is. (Most likely most of the coefficients will be zero (or within a very small numerical error from zero) and the others will be 1 (or within a very small numerical error from that) and you can probably figure out from that relationship what is going on.

                    Added: Crossed with #9 which gives the key! By coding active = 0 at all times in the control group you have sabotaged your model. Disregard what I wrote above. It's not only non-conventional, it's wrong. You were given bad advice.

                    Is this one of those situations where different entities in the treatment group underwent the "treatment" in the same year? If so, you need to set active = 1 in the control group in the same years that the treatment group were active.

                    If the different entities in the treatment group became active in different years, then you do not have a basic DID data set. You need to do something different, a generalized DiD model. See https://www.ipr.northwestern.edu/wor.../Day%204.2.pdf for more information.
                    Last edited by Clyde Schechter; 03 Apr 2018, 19:20.

                    Comment


                    • #11
                      Okay—done. The states that received the hbi treatment (IA, IN, and MI) had coefficients of 1, and the coefficient for 1.hbi#0.active is -1. Otherwise, the coefficients were very close to zero.

                      ADDED: Just to be sure, you're saying to disregard the following?

                      OK. Something is wrong here. You should not be getting 1.hbi#1.active omitted because of colinearity. I cannot tell just what the cause of this is. It may be that some coincidental pattern of missing values for other variables in the regression is causing the combination of hbi = 1 and active = 1 to occur only under some conditions that can be perfectly linearly predicted. Or it may be that your hbi or active variable is coded incorrectly. So you're going to have to look into that. The quickest way to figure it out is

                      Code:
                      gen interaction = 1.hbi#1.active regress interaction 0.hbi#1.active 1.hbi#0.active i._state i.yearquarter married /// _race_q1 educa age female i.income2 employ1 cell if !missing(outcome)
                      You will get a regression with R2 = 1 and the coefficients will show you exactly what the linear relationship is. (Most likely most of the coefficients will be zero (or within a very small numerical error from zero) and the others will be 1 (or within a very small numerical error from that) and you can probably figure out from that relationship what is going on.
                      And the way you've explained to code "active" for control states makes a lot more sense. I will read up on the generalized DID model, since the treatment does not occur in the same time period for each state.

                      Really appreciate the help!
                      Last edited by Daniel Nelson; 03 Apr 2018, 20:01.

                      Comment


                      • #12
                        This is consistent with your added information in #9 being the root of the problem, on the assumption that IA, IN, and MI are your treatment group states.

                        Comment


                        • #13
                          Hi Clyde (and others!),

                          I'm revisiting a question related to the above—the model is coded as a generalized DID/two-way fixed effects model, with two different categorical treatment types that vary at the state level. The interventions do not occur simultaneously, but are implemented in different quarters of the year. Here is what the regression looks like:
                          Code:
                          regress outcome hbi_active tradexp_active i.yearquarter i._state i._race_g1 married i.educa age female i.income2 employ1 cell [pweight=_llcpwt], cluster(_state)
                          where hbi_active and tradexp_active are variables equal to 1 in periods (quarter-year) where the respective treatment (at the state-level) is active, but otherwise 0 (the interventions do not all begin in the same period).

                          My question is whether there is any way to find regression-adjusted pre- and post-treatment outcome means for each treatment type and the control group. At the moment we are just reporting survey-weighted averages, but are hoping to present regression adjusted versions, as well. I apologize if this is obvious, but I can't seem to figure it out. I appreciate your help!

                          Comment


                          • #14
                            Unfortunately, no you can't get that from a fixed-effects model. The fixed-effects model is a purely within-group estimator it is incapable of estimating group-specific outcomes. If you try the "obvious" approach of running
                            Code:
                            margins hbi_active#tradexp_active
                            Stata will respond that the requested parameters are "not estimable" and refuse to provide results.

                            If you get pushy, you can override -margins- reticence by adding the -noestimcheck- option, and Stata will comply and spit out some numbers. The problem is that those numbers are garbage. Just try switching the base (omitted) category for _state and repeat the process and you will get different results. The fixed-effects model is not capable of identifying these parameters.

                            Comment


                            • #15
                              Ah, I see. Well, although it's disappointing we won't be able to get those estimates, I'm relieved that I wasn't missing something. No matter how hard I stared at that regression equation I couldn't figure out how any of the variables could add up to what we were looking for.

                              Thanks again for your help!

                              Comment

                              Working...
                              X