Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Diff in diff with IV for treatment dummy

    Hi all,

    For my data analysis, I am estimating a difference-in-difference model of the sort
    Code:
     Y = a + bPOST + cTREAT + dPOST##TREAT +eCONTROLS + u
    Since I am concerned with the endogeneity of my treatment variable TREAT, I would like to employ an IV strategy, in line with the relevant literature.

    However, I am not sure how to run the IV regression in Stata, given that TREAT is part of an interaction term. I do not think the following code gives me the correct result:
    Code:
     ivreg 2sls Y POST (TREAT = Z1)  POST##TREAT CONTROLS
    What is the right way of going about this?
    Many thanks in advance.

    Best,
    Sophia
    Last edited by Sophia Magis; 20 Apr 2021, 02:59. Reason: added tags

  • #2
    Please do let me know if you require any more information to be able to answer the question.

    Comment


    • #3
      Try this:
      Code:
      ivregress 2sls y post (i.treat i.treat#i.post = z) x
      I am not sure if this is completely valid econometrically, but it does what you ask.

      Comment


      • #4
        Thank you, Dimitriy. Indeed, this code gives me the error
        Code:
        equation not identified; must have at least as many instruments not in
        the regression as there are instrumented variables
        Do I understand right that you are suggesting it does not make sense econometrically to use an IV for a treatment variable in a diff-in-diff setting? Could you please explain why? Thanks a lot!

        Comment


        • #5
          You will need additional instruments since you have two endogenous variables. One common approach is this:

          Code:
          . use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta
          (Dataset from Card&Krueger (1994))
          
          . ivregress 2sls fte i.t (i.treated#i.t = bk bk#i.t)
          note: 1.bk#1.t omitted because of collinearity
          
          Instrumental variables (2SLS) regression          Number of obs   =        801
                                                            Wald chi2(3)    =       0.60
                                                            Prob > chi2     =     0.8965
                                                            R-squared       =          .
                                                            Root MSE        =     80.642
          
          ------------------------------------------------------------------------------
                   fte |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
             treated#t |
                 NJ#0  |  -225.4102    454.179    -0.50   0.620    -1115.585    664.7642
                 NJ#1  |  -180.7203   303.9624    -0.59   0.552    -776.4758    415.0351
                       |
                   1.t |   -36.2768   440.8818    -0.08   0.934    -900.3893    827.8357
                 _cons |   199.5123   366.5129     0.54   0.586    -518.8398    917.8644
          ------------------------------------------------------------------------------
          Instrumented:  1.treated#0b.t 1.treated#1.t
          Instruments:   1.t bk 1.bk#0b.t
          But I am not sure if this is strictly valid since I know nothing about your empirical setting and what kind of endogeneity problem you have.

          You should probably think about adjusting your standard errors for clustering as well.

          Comment


          • #6
            Thank you so much Dimitriy, and apologies for the delayed reply. I managed to reproduce the code in my data, and it works. However, I would also need 1.treated to be included in the regression output (in my case 1.user) - any suggestions on how to achieve that? Many thanks!

            Code:
            . ivregress 2sls lneducexp i.shock (  i.user#i.shock= c.agdist c.agdist#i.shock), vce(cl
            > uster hhid)
            
            Instrumental variables (2SLS) regression          Number of obs   =        595
                                                              Wald chi2(3)    =       4.90
                                                              Prob > chi2     =     0.1794
                                                              R-squared       =          .
                                                              Root MSE        =     1.9605
            
                                                (Std. Err. adjusted for 427 clusters in hhid)
            ---------------------------------------------------------------------------------
                            |               Robust
                  lneducexp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            ----------------+----------------------------------------------------------------
            user#shock |
                       1 0  |   .8662527   .8800556     0.98   0.325    -.8586246     2.59113
                       1 1  |   4.418828    9.02754     0.49   0.624    -13.27483    22.11248
                             |
                  1.shock  |  -1.907025   3.920626    -0.49   0.627    -9.591311     5.77726
                    _cons |   7.902556   .4587876    17.22   0.000     7.003349    8.801763
            ---------------------------------------------------------------------------------
            Instrumented:  1.user#0b.shock 1.user#1.shock
            Instruments:   1.shock agdist 1.shock#c.agdist

            Comment


            • #7
              I don't understand how your model above relates to the equation in your original question. But, presumably, you need to include i.user in your specification if you want to see the coefficient.

              Comment


              • #8
                Let me try again, sorry if I wasn't really clear so far.

                So basically, I estimate the most basic form of my model via:

                Code:
                reg lneducexp i.shock##i.user, vce(cluster hhid)
                This is basically a DID setup. That is, I analyse whether mobile money users' education expenditure is more resistant to exogenous illness shocks compared to non-users. I.e. I analyse whether in the event of a shock mobile money users can smooth consumption better than non-users.

                However, the decision to become a mobile money user is by no means random, but rather depends on several observable and unobservable factors. I would thus like to use distance to the nearest mobile money agent -c.agdist- as an IV for i.user (which is a common instrument in the literature). The reason I was confused how to implement this in Stata is that i.user is part of an interaction term in my model. In my IV regression output, I would need the coefficients for i.user, i.shock and 1.user#1.shock.
                Hope this makes more sense now!

                Comment


                • #9
                  Sophia: This is where the Stata factor notation can be confusing. I would do the following, essentially what Dimitriy suggested. I prefer using "c." in these situations so I don't get extra (dropped) interactions.

                  Code:
                  ivregress 2sls lneducexp c.shock (c.user c.shock#c.user = c.agdist c.shock#c.agdist), vce(cluster id)
                  JW

                  Comment


                  • #10
                    Thank you very much, Mr. Wooldridge, this works and is exactly what I was looking for.

                    Comment


                    • #11
                      Hi Jeff Wooldridge,

                      Thank you for providing the code for the Wald-DID. I'm trying to replicate it to estimate the local average treatment effect from my generalized DID:

                      Code:
                       
                       svy: regress OUTCOMEVAR i.time##i.intervention
                      using the following code:

                      Code:
                       
                       svy: ivregress 2sls OUTCOMEVAR i.time (i.intervention i.time#i.intervention = i.instrument i.time#i.instrument)
                      and heterogeneous effects by gender, using the following:

                      Code:
                       
                       svy: ivregress 2sls OUTCOMEVAR i.time i.female i.time#i.female (i.intervention i.intervention#i.female i.time#i.intervention i.time#i.intervention#i.female = i.instrument i.instrument#i.female i.time#i.instrument i.time#i.instrument#i.female)
                      where time (pre/post), intervention (0=Control, 1=Treatment 1, 2=Treatment 2) and instrument (indicating whether the treatment was actually administered for non-attriters, 0 "Control" 1 ="Treatment 1", 2="Treatment 2") are categorical variables.

                      At first, the basic ivregress code was working perfectly fine until I noticed I made a small error when generating the variable instrument. After correcting it, I keep getting the following message: "instrumental variable equation not identified; must have at least as many instruments not in the regression as there are instrumented variables", despite not having changed the code. I have gone over your very informative books (Introductory Econometrics and Econometric Analysis of Cross Section and Panel data) but I am struggling to implement IV in the DID setting, or understand the issue. I would be very grateful if you could provide me with some guidance.

                      Please find attached the log file. Apologies for the cross-posting (https://www.statalist.org/forums/for...in-differences).

                      Thank you
                      Attached Files

                      Comment


                      • #12
                        Hi all,

                        I've rerun the following code, setting the trace on and comparing the outputs using the "wrong" and "right" instrument".

                        Code:
                         
                         svy: ivregress 2sls OUTCOMEVAR i.time (i.intervention i.time#i.intervention = i.instrument i.time#i.instrument)
                        When using the "wrong" instrument, the instrument in the background were:
                        0b.instrument 1.instrument 3.instrument 0b.time#0b.instrument 0b.time#1o.instrument 0b.time#3o.instrument 1o.time#0b.instrument 1.time#1.instrument 1.time#3.instrument,

                        as expected. However, when using the "corrected" instrument, only the following were generated:
                        0b.instrument 1.instrument 0b.time#0b.instrument 0b.time#1o.instrument 0b.time#3o.instrument 1o.time#0b.instrument 1.time#1.instrument,
                        that is, 3.instrument and 1.time#3.instrument were not included in the background. Would anyone know what the problem could be?


                        Thank you

                        Comment

                        Working...
                        X