Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • correct command for difference-in-difference

    Hi all,

    I am running a difference in difference (DID) regression on two different outcomes: (i) probability of doing a specialist visit and (ii) number of specialist visits in the last month. Outcome (i) is a binary variable whereas outcome (ii) is a count variable.
    At this point I am confused about the correct way to implement the DID regressions on Stata.
    For outcome (i) I could run on STATA a logit regression of the form:


    1) logit probvis treatment time interaction covariates, robust

    where 'interaction' represent the interactioin between a dummy for the treatment and a dummy for the time when the treatment of interest is active. In particular, the coefficient associated to treatment is the difference in difference effect that I am interested to.
    Alternatively, I know about the command diff, which I would use in the following way:
    2) diff probvis, t(treatment) p(time) cov(covariates) robust


    In this case, the regressions results are displayed in a way that highlights the time differences and also the final difference in difference coefficients.
    Similarly, I can perform the same regressions for outcome (ii) with the difference that, in this case, I should replace the command logit with an appropriate count data model.

    Could someone please explain what is the difference between the above two methods? Am I right in thinking that with command 2) I am simply implementing a linear DID regression and I should therefore prefer model 1) for a binary outcome? Does the same type of argument apply to outcome (ii) ?


    Thank you - please, feel free to correct any mistake that may appear in my post.

    Magherita Neri
    Last edited by Margherita Neri; 30 Jul 2016, 03:42.

  • #2
    Non-linear DiD using logit or any of the count data models can be tricky, especially if you want to get the additive marginal effects. I would recommend that you take a look at this paper for some guidance if you want to go this route.

    Since DiD models are essentially about comparing means, I would not worry too much about using a linear model since your model is saturated (i.e., you have all possible interactions).

    Comment


    • #3
      Thank you Dimitriy.
      Are you able to recommend any paper with a diff-in-diff empirical approach on a binary variable?
      Thank you

      Comment


      • #4
        The one I linked covers both logit and probit. There's a working paper version that you can find by googling the title.

        Comment


        • #5
          Here's an example of how to calculate the AMEs using the famous Card & Kruger data minimum wage data:

          Code:
          use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta, clear
          gen binary_y = (fte>17)
          
          gen t2 = (treated==1 & t == 1)
          qui probit binary_y t2 i.treated##i.t
          margins, expression(normal(predict(xb)) - ///
                              normal(predict(xb)-_b[t2])) at(treated=1 t=1 t2=1)
          
          qui probit binary_y i.treated##i.t
          replace treated = 1
          replace t = 1
          margins, expression(normal(predict(xb)) - ///
                  normal(predict(xb)-_b[1.treated#1.t]))
          The second one is a bit easier to relate to the Puhani formulas, but overwrites your data, though that is avoidable.

          Comment


          • #6
            Thank you very much for the code.
            Would you say that I can apply the Puhani formulas to a count data model as well?

            Comment


            • #7
              The first code provided means:

              - Regress a probit model of the employment on treatment, time and the interaction term (treatment*time) - Why are introducing the interaction term twice: with t2 and with i.treated#i.t that also generates the interaction?

              - Then evaluate the margins, according to Puhani's formula, given by the incremental effect of the interaction term. So it is about calculating the margin from the difference between the normal CDF evaluated at the estimated coefficients of the logit regression and the normal CDF with the same coefficients minus the interaction term - what is the command 'at' for in this case?

              Is this the correct interpretation?
              Last edited by Margherita Neri; 01 Aug 2016, 09:18.

              Comment


              • #8
                1) If you remove the -qui- from the probit, you will see the second interaction is dropped. It's redundant, but does no harm.

                2) You have two observations for each restaurant, a pre and a post. The -at- options allows you to set both period's covariates to the correct post values. Equivalently, you could have used

                Code:
                margins if treated==1 & t==1, expression(normal(predict(xb)) - ///
                                    normal(predict(xb)-_b[t2]))
                3) The Poisson model would work the same way, though the link function is the exponential, rather than the standard normal CDF:

                Code:
                poisson fte t2 i.treated##i.t, robust
                margins, expression(exp(predict(xb)) - ///
                                    exp(predict(xb)-_b[t2])) at(treated=1 t=1 t2=1)
                For brevity, I've neglected how to do the standard errors here, since you have dependent observations. You might want to cluster on id.

                Comment


                • #9
                  Do I get the same margin if I type:

                  margins, dydx(t2) at(treated=1 t=1 t2=1) ?

                  or


                  margins, dydx(t2) at(treated=1 t=1)

                  is the correct one?
                  Last edited by Margherita Neri; 02 Aug 2016, 06:01.

                  Comment


                  • #10
                    The problem with the first is that you are treating t2 as a continuous variable, whereas it is a binary one. This is calculating the effect for a small increase in t2, which will be close, but not quite the as same as the finite difference marginal effect in my code, which calculates the effect when t2 goes from 0 to 1. There might be a way to get Stata to calculate this with the factor variable i.t2 in the Probit, but I have never managed this to work.

                    The second one will not be estimable.

                    Comment


                    • #11
                      Thank you Dimitriy.

                      However, if I run the probit regression and specify i.t2 as regressor, Stata classifies t2 as a dummy variable. Then I calculate the margin with:

                      margins, dydx(t2) at (treated=1 t=1 t=2)

                      And I obtain the same value of the margin as if I implement your initial code
                      margins, expression(exp(predict(xb)) - exp(predict(xb)-_b[t2])) at(treated=1 t=1 t2=1)
                      Last edited by Margherita Neri; 03 Aug 2016, 04:42.

                      Comment


                      • #12
                        I can't seem to replicate your results, so you need to be explicit about exactly what you typed into Stata (both probit and margins). Also, at(treated=1 t=1 t=2) does not make sense to me. Did you mean something else?

                        Comment


                        • #13
                          I ran the probit regression and calculated the margins with my dataset, here I will use abbreviations for my variables.
                          probvis = probability of a visit
                          interaction = treatment*time

                          The two methods that I used are:
                          1) probit probvis treatment time interaction
                          margins, expression(normal(predict(xb)) - normal(predict(xb)-_b[interaction])) at(treatment=1 time=1 interaction=1)


                          2) probit probvis treatment time i.interaction
                          margins, dydx(interaction) at(treatment==1 time==1 interaction==1)


                          With the two methods I obtain the same value of the margin

                          Comment


                          • #14
                            This looks good to me.

                            Comment


                            • #15
                              Hi Dimitriy, sorry about writing again.
                              I was going trough the estimatioin of diff-in-diff effects as in Puhani (2012). As far as I understand, the treatment effect in a diff-in-diff is the marginal effect of the interaction term on the probability of success. Therefore, I do not understand what is the difference between computing this effect with the code you have provided and simply asking stata to compute:
                              margins, dydx(interaction)
                              I get different results in fact (although the difference is not too large) but I am unable to understand what the source of difference in the estimates is.
                              Thank you again!
                              Last edited by Margherita Neri; 10 Aug 2016, 09:41.

                              Comment

                              Working...
                              X