Diff in Diff: DRDID and CSDID

FernandoRios

Join Date: Apr 2014

Posts: 2497
#16

15 Sep 2021, 11:44

HI Richard
So, you are correct. The formula suggests that PTA holds if the Growth in the control group and the growth in the treatment group (absent of treatment) hold. This, however, cannot be estimated after treatment has been implemented.
So, the way it is proxied in CSDID (and other DID estimators) is to analyze that change looking at data before the treatment took place.
Regarding the use of Covariates.
The way that CS state the model, All characteristics are time invariant, so X_{t}=X_{t-1}=X_{t-k}. So it doesn't matter what period you look at, you are using the same values for X.
Now, in practice, if you are using panel data, you use the period X_{t-1} to estimate the parallel trends. (assuming t<g)

Using CS language, Treatment effecs and pretrends are estimated exactly the same way. THey are all called att(g,t) which stands for the Average treatment effect for group G at time T

when analyzing data "after" treatment (t>=g) the att(g,t) compares outcomes for period t with outcomes in period g-1 (last period without treatment). Here you use X_{g-1}
If you instead analyze data Before treatment (t<g) the att(g,t) compares outcomes for periods t and t-1. This att(g,t)'s are used for pretretrend test. Here you use X_{t-1}

what differs across estimators is simply how att(g,t) is estimated.

HTH

.
1 like
Comment
Richard Thomas Boylan

Join Date: Sep 2018

Posts: 45
#17

15 Sep 2021, 13:43

Got it. Thank you so much.
Comment
Richard Thomas Boylan

Join Date: Sep 2018

Posts: 45
#18

22 Sep 2021, 08:29

Dumb question. I cannot figure out how access the coefficients. Here is an example.
use https://friosavila.github.io/playing...rdid/mpdta.dta, clear
csdid lemp lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw) agg(event)
test T+0
gives me an error "T ambiguous abbreviation"
I have tried all sorts of permutations to figure how to do an F test that all the leads (or lags) are significant, and cannot figure out how to do that.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2497
#19

22 Sep 2021, 10:41

hi Richard
Unfortunately, you cannot use "test" to do something like what you describe right now. I ll try to add an option in a future update.

There is, however, a trick that you can use.

Code:

use https://friosavila.github.io/playingwithstata/drdid/mpdta.dta, clear csdid lemp lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw) agg(event) matrix b=e(b) matrix V=e(V) program addex, eclass ereturn `0' end matrix colname b = t1 t2 t3 t4 t5 t6 t7 matrix colname V = t1 t2 t3 t4 t5 t6 t7 matrix rowname V = t1 t2 t3 t4 t5 t6 t7 addex post b V test t1

So the idea is to get the variance and covariances, rename all columns and rows, create a new equation object , and you can test the coefficients with "test"
HTH
Comment
Richard Thomas Boylan

Join Date: Sep 2018

Posts: 45
#20

22 Sep 2021, 10:57

Thanks for coming up with a solution & giving an explanation for how your code works.
1 like
Comment

Angelo Cozzubo

Join Date: Mar 2015
Posts: 32

#21

20 Oct 2021, 10:11

Dear Fernando,

Thank you very much for this implementation. I am having a blast experimenting with it as it is very clear and straightforward to run.

I want to ask you how you would recommend estimating heterogeneous treatment effects. From your 2021 Stata Conference slides, I read that "Embrace TE heterogeneity in the same way as teffects does in cross-section setups." However, I am not sure how to implement it.

Please, would you be able to give an example? As an experiment, I was trying to estimate heterogeneous effects for lpop quantiles from your mpdta database from the help file (even if this does not make much sense).

Code:

use https://friosavila.github.io/playingwithstata/drdid/mpdta.dta, clear
xtile q_lpop = lpop, n(5)

5 quantiles |
    of lpop |      Freq.     Percent        Cum.
------------+-----------------------------------
          1 |        500       20.00       20.00
          2 |        500       20.00       40.00
          3 |        500       20.00       60.00
          4 |        500       20.00       80.00
          5 |        500       20.00      100.00
------------+-----------------------------------
      Total |      2,500      100.00

*This gives me the ATT aggregation but no heterogeneous effects. It uses q_lpop as a control variable.

csdid lemp q_lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw) agg(simple)
............
Difference-in-difference with Multiple Time Periods
Outcome model  : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         ATT |  -.0411743     .01142    -3.61   0.000    -.0635571   -.0187915
------------------------------------------------------------------------------
Control: Never Treated

How should I proceed with the csdid command to compute the heterogeneous effects of q_lpop?

Thanks a lot!

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2497
#22

20 Oct 2021, 12:02

So, 1) those are really Pedro Sant'Anna slides. HE is the one who came up with the estimator, I'm just the interpreter!
So, you cannot estimate different effects of qlpop. All the heterogeneity comes from the treatment timing and when it is measured.

If you do only
csdid lemp q_lpop , ivar(countyreal) time(year) gvar(first_treat) method(dripw)

It will produce, by default, treatment effects for all groups (those treated in 2004, 2006 and 2007) at all points in time (2004,2005,2006,2007)
That is the kind of heterogeneity deferred to in that last slide.
F
1 like
Comment
Angelo Cozzubo

Join Date: Mar 2015

Posts: 32
#23

20 Oct 2021, 18:39

Thank you so much for the clarification and the quick response.

Please, let me know if you have any hints on approaching heterogeneous effects (by a covariate). The referees have been very demanding in this regard.

Thanks again!
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2497
#24

20 Oct 2021, 18:55

if q_lpop has enough variation, you can always try to do csdid by subsample
xtile qq=q_lpop, n(5)
csdid y if qq==1
etc
2 likes
Comment
Richard Thomas Boylan

Join Date: Sep 2018

Posts: 45
#25

28 Oct 2021, 07:58

When I run csdid, I get the following output

................xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxx..................xxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxx

Eventually, I get estimates, but I am concerned about using them.

I assume that the "x" are bootstrap samples where the procedure produced an error. Is there any way to figure out what is causing the error?

P.S. : I tried "set trace on" but I did not get anything useful out of it, or maybe I missed what I should have seen.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2497
#26

28 Oct 2021, 11:30

Hi Richard
No, the "x" and "." are not bootstrap repetitions. Each dot represents a particular 2x2 DID estimate. If an X appears it usually means that particular ATTGT could not be estimated. Either because of insufficient data.
In all those cases, you will see blanks in the basic CSDID output.
Let me know if you have other questions
F
Comment
Richard Thomas Boylan

Join Date: Sep 2018

Posts: 45
#27

28 Oct 2021, 12:35

Ok, great! I am surprised however how long it takes for the program to figure out that a particular 2x2 DID could not be estimates.

So, for instance, in the output

................xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

the string of "xxxx" took a * long * time to estimate.

Also, in the csdid help file it states: " Additionally, you may not need ALL periods, requiring only few periods before the first
treatment year." It seems to me that by doing that, one ends up with a bunch 2x2 DID that cannot be estimated, and so it would be nice if there was a faster way to get through those.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2497
#28

28 Oct 2021, 13:15

Yes, I agreed.
In some cases the X's occur during the logit/ipt step. So perhaps that is what is holding the process up.
I ll try to check that up in the code.
Comment
Richard Thomas Boylan

Join Date: Sep 2018

Posts: 45
#29

28 Oct 2021, 13:23

Thanks. Do you need me to generate a data set that causes these problems?
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2497
#30

28 Oct 2021, 14:01

That would be very helpful. Please if you can contact me at [email protected]. and send me a replicable example, i can take a look and see if my guess is correct regarding the problems.
Thank you
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment