Diff in diff with IV for treatment dummy

Sophia Magis

Join Date: Nov 2020

Posts: 16
#1

Diff in diff with IV for treatment dummy

20 Apr 2021, 02:57

Hi all,

For my data analysis, I am estimating a difference-in-difference model of the sort

Code:

Y = a + bPOST + cTREAT + dPOST##TREAT +eCONTROLS + u

Since I am concerned with the endogeneity of my treatment variable TREAT, I would like to employ an IV strategy, in line with the relevant literature.

However, I am not sure how to run the IV regression in Stata, given that TREAT is part of an interaction term. I do not think the following code gives me the correct result:

Code:

ivreg 2sls Y POST (TREAT = Z1) POST##TREAT CONTROLS

What is the right way of going about this?
Many thanks in advance.

Best,
Sophia

Last edited by Sophia Magis; 20 Apr 2021, 02:59. Reason: added tags
Tags: diif-in-diff, IV
Sophia Magis

Join Date: Nov 2020

Posts: 16
#2

20 Apr 2021, 05:41

Please do let me know if you require any more information to be able to answer the question.
Comment
Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#3

20 Apr 2021, 12:19

Try this:

Code:

ivregress 2sls y post (i.treat i.treat#i.post = z) x

I am not sure if this is completely valid econometrically, but it does what you ask.
Comment
Sophia Magis

Join Date: Nov 2020

Posts: 16
#4

20 Apr 2021, 12:42

Thank you, Dimitriy. Indeed, this code gives me the error

Code:

equation not identified; must have at least as many instruments not in the regression as there are instrumented variables

Do I understand right that you are suggesting it does not make sense econometrically to use an IV for a treatment variable in a diff-in-diff setting? Could you please explain why? Thanks a lot!
Comment

Dimitriy V. Masterov

Join Date: Mar 2014
Posts: 609

20 Apr 2021, 12:58

You will need additional instruments since you have two endogenous variables. One common approach is this:

Code:

. use http://fmwww.bc.edu/repec/bocode/c/CardKrueger1994.dta
(Dataset from Card&Krueger (1994))

. ivregress 2sls fte i.t (i.treated#i.t = bk bk#i.t)
note: 1.bk#1.t omitted because of collinearity

Instrumental variables (2SLS) regression          Number of obs   =        801
                                                  Wald chi2(3)    =       0.60
                                                  Prob > chi2     =     0.8965
                                                  R-squared       =          .
                                                  Root MSE        =     80.642

------------------------------------------------------------------------------
         fte |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
   treated#t |
       NJ#0  |  -225.4102    454.179    -0.50   0.620    -1115.585    664.7642
       NJ#1  |  -180.7203   303.9624    -0.59   0.552    -776.4758    415.0351
             |
         1.t |   -36.2768   440.8818    -0.08   0.934    -900.3893    827.8357
       _cons |   199.5123   366.5129     0.54   0.586    -518.8398    917.8644
------------------------------------------------------------------------------
Instrumented:  1.treated#0b.t 1.treated#1.t
Instruments:   1.t bk 1.bk#0b.t

But I am not sure if this is strictly valid since I know nothing about your empirical setting and what kind of endogeneity problem you have.

You should probably think about adjusting your standard errors for clustering as well.

Comment

Sophia Magis

Join Date: Nov 2020
Posts: 16

04 May 2021, 04:11

Thank you so much Dimitriy, and apologies for the delayed reply. I managed to reproduce the code in my data, and it works. However, I would also need 1.treated to be included in the regression output (in my case 1.user) - any suggestions on how to achieve that? Many thanks!

Code:

. ivregress 2sls lneducexp i.shock (  i.user#i.shock= c.agdist c.agdist#i.shock), vce(cl
> uster hhid)

Instrumental variables (2SLS) regression          Number of obs   =        595
                                                  Wald chi2(3)    =       4.90
                                                  Prob > chi2     =     0.1794
                                                  R-squared       =          .
                                                  Root MSE        =     1.9605

                                    (Std. Err. adjusted for 427 clusters in hhid)
---------------------------------------------------------------------------------
                |               Robust
      lneducexp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
user#shock |
           1 0  |   .8662527   .8800556     0.98   0.325    -.8586246     2.59113
           1 1  |   4.418828    9.02754     0.49   0.624    -13.27483    22.11248
                 |
      1.shock  |  -1.907025   3.920626    -0.49   0.627    -9.591311     5.77726
        _cons |   7.902556   .4587876    17.22   0.000     7.003349    8.801763
---------------------------------------------------------------------------------
Instrumented:  1.user#0b.shock 1.user#1.shock
Instruments:   1.shock agdist 1.shock#c.agdist

Comment

Dimitriy V. Masterov

Join Date: Mar 2014

Posts: 609
#7

04 May 2021, 15:00

I don't understand how your model above relates to the equation in your original question. But, presumably, you need to include i.user in your specification if you want to see the coefficient.
Comment
Sophia Magis

Join Date: Nov 2020

Posts: 16
#8

05 May 2021, 08:47

Let me try again, sorry if I wasn't really clear so far.

So basically, I estimate the most basic form of my model via:

Code:

reg lneducexp i.shock##i.user, vce(cluster hhid)

This is basically a DID setup. That is, I analyse whether mobile money users' education expenditure is more resistant to exogenous illness shocks compared to non-users. I.e. I analyse whether in the event of a shock mobile money users can smooth consumption better than non-users.

However, the decision to become a mobile money user is by no means random, but rather depends on several observable and unobservable factors. I would thus like to use distance to the nearest mobile money agent -c.agdist- as an IV for i.user (which is a common instrument in the literature). The reason I was confused how to implement this in Stata is that i.user is part of an interaction term in my model. In my IV regression output, I would need the coefficients for i.user, i.shock and 1.user#1.shock.
Hope this makes more sense now!
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2124
#9

05 May 2021, 08:59

Sophia: This is where the Stata factor notation can be confusing. I would do the following, essentially what Dimitriy suggested. I prefer using "c." in these situations so I don't get extra (dropped) interactions.

Code:

ivregress 2sls lneducexp c.shock (c.user c.shock#c.user = c.agdist c.shock#c.agdist), vce(cluster id)

JW
1 like
Comment
Sophia Magis

Join Date: Nov 2020

Posts: 16
#10

05 May 2021, 11:34

Thank you very much, Mr. Wooldridge, this works and is exactly what I was looking for.
Comment
Nicolas Orgeira

Join Date: Sep 2015

Posts: 165
#11

29 Jun 2021, 19:10

Hi Jeff Wooldridge,

Thank you for providing the code for the Wald-DID. I'm trying to replicate it to estimate the local average treatment effect from my generalized DID:

Code:

svy: regress OUTCOMEVAR i.time##i.intervention

using the following code:

Code:

svy: ivregress 2sls OUTCOMEVAR i.time (i.intervention i.time#i.intervention = i.instrument i.time#i.instrument)

and heterogeneous effects by gender, using the following:

Code:

svy: ivregress 2sls OUTCOMEVAR i.time i.female i.time#i.female (i.intervention i.intervention#i.female i.time#i.intervention i.time#i.intervention#i.female = i.instrument i.instrument#i.female i.time#i.instrument i.time#i.instrument#i.female)

where time (pre/post), intervention (0=Control, 1=Treatment 1, 2=Treatment 2) and instrument (indicating whether the treatment was actually administered for non-attriters, 0 "Control" 1 ="Treatment 1", 2="Treatment 2") are categorical variables.

At first, the basic ivregress code was working perfectly fine until I noticed I made a small error when generating the variable instrument. After correcting it, I keep getting the following message: "instrumental variable equation not identified; must have at least as many instruments not in the regression as there are instrumented variables", despite not having changed the code. I have gone over your very informative books (Introductory Econometrics and Econometric Analysis of Cross Section and Panel data) but I am struggling to implement IV in the DID setting, or understand the issue. I would be very grateful if you could provide me with some guidance.

Please find attached the log file. Apologies for the cross-posting (https://www.statalist.org/forums/for...in-differences).

Thank you
Attached Files

ivregress.smcl (11.0 KB, 2 views)
Comment
Nicolas Orgeira

Join Date: Sep 2015

Posts: 165
#12

09 Jul 2021, 13:21

Hi all,

I've rerun the following code, setting the trace on and comparing the outputs using the "wrong" and "right" instrument".

Code:

svy: ivregress 2sls OUTCOMEVAR i.time (i.intervention i.time#i.intervention = i.instrument i.time#i.instrument)

When using the "wrong" instrument, the instrument in the background were:
0b.instrument 1.instrument 3.instrument 0b.time#0b.instrument 0b.time#1o.instrument 0b.time#3o.instrument 1o.time#0b.instrument 1.time#1.instrument 1.time#3.instrument,

as expected. However, when using the "corrected" instrument, only the following were generated:
0b.instrument 1.instrument 0b.time#0b.instrument 0b.time#1o.instrument 0b.time#3o.instrument 1o.time#0b.instrument 1.time#1.instrument,
that is, 3.instrument and 1.time#3.instrument were not included in the background. Would anyone know what the problem could be?

Thank you
Comment

Announcement