Diff in Diff: DRDID and CSDID

Gabor Mugge

Join Date: Apr 2021

Posts: 30
#166

07 Jul 2022, 03:49

Hi Fernando,

Thank you for your reply!
You can access the data here. (The `var' variable should be changed to cell_counter in the previous codes.)
I did check that I have the latest versions of csdid and drdid.

Thank you,
Gabor
Comment
Leon Schmidt

Join Date: Apr 2018

Posts: 98
#167

07 Jul 2022, 03:54

Dear FernandoRios ,

Thanks a lot for your answer, no worries, I thought so!

I would be very grateful if you have any advice on the following issue. I am studying the effect of a change in a firm's legal form on the adoption of a new technology.

My main code is this:

Code:

csdid technology, ivar(firm_id) time(year) gvar(legal_form) method(drimp) notyet estat all

I am using the latest versions of csdid and drdid (1.57 and 1.67).

The tab between year and the gvar (legal_form) is the following (and similarly for the years after 1880 in the top row; I didn't want to include a second screenshot but I am happy to do so if necessary):

If I do this, I get estimates for g1865 but not for g1866 and onwards. They are all omitted. It looks like this:

However, I then installed an earlier version of drdid. Specifically, I ran "net install csdid, from ("https://raw.githubusercontent.com/friosavila/csdid_drdid/main/code/") replace" which I found on your website. This installs version 1.63 of drdid. Now, the g's look very much different and I also get pre-treatment averages (and effects by year).

Additionally, when I am using the latest version of drdid and drop the first year, the program then estimates the g's for the new first year (so g1866) but not for the following years.

So I am wondering whether you have any idea what's going on here.

Thank you very much for your help!

Last edited by Leon Schmidt; 07 Jul 2022, 03:58.
Comment
Georgina Shapley

Join Date: Jul 2022

Posts: 3
#168

08 Jul 2022, 11:12

Dear Fernando,

Thank you so much for your help so far. I have yet another question regarding the post-estimation output, and I apologize because I am sure my question is trivial, but: I don't understand what estat cevent, window(t1 t2) is exactly. The help file says that this command estimates censored event averages. Then, it further explains that it estimates the average across all ATTGT's that correspond to periods between t1 and t2, inclusive. But I cannot figure out what ATTGT's are being averaged.

As an example, with the data you kindly provide in the help file, https://friosavila.github.io/playing...rdid/mpdta.dta,

estat cevent, window(-3 -1)

has as output
ATT for events between -3 -1
Event Study:Aggregate effects
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
ATTC | -.0023082 .0075001 -0.31 0.758 -.0170082 .0123917
------------------------------------------------------------------------------

which I seem unable to obtain neither from the output of estat event, which is

ATT by Periods Before and After treatment
Event Study:Dynamic effects
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
Pre_avg | -.0000442 .0075204 -0.01 0.995 -.014784 .0146955
Post_avg | -.0803539 .0189576 -4.24 0.000 -.1175101 -.0431978
Tm3 | .0267278 .0140657 1.90 0.057 -.0008404 .054296
Tm2 | -.0036165 .0129283 -0.28 0.780 -.0289555 .0217226
Tm1 | -.023244 .0144851 -1.60 0.109 -.0516343 .0051463
Tp0 | -.0210604 .0114942 -1.83 0.067 -.0435886 .0014679
Tp1 | -.0530032 .0163465 -3.24 0.001 -.0850417 -.0209647
Tp2 | -.1404483 .0353782 -3.97 0.000 -.2097882 -.0711084
Tp3 | -.1069039 .0328865 -3.25 0.001 -.1713602 -.0424476
------------------------------------------------------------------------------

nor from the raw DiD estimates, below,

Difference-in-difference with Multiple Time Periods

Number of obs = 2,500
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g2004 |
t_2003_2004 | -.0145297 .0221292 -0.66 0.511 -.057902 .0288427
t_2003_2005 | -.0764219 .0286713 -2.67 0.008 -.1326166 -.0202271
t_2003_2006 | -.1404483 .0353782 -3.97 0.000 -.2097882 -.0711084
t_2003_2007 | -.1069039 .0328865 -3.25 0.001 -.1713602 -.0424476
-------------+----------------------------------------------------------------
g2006 |
t_2003_2004 | -.0004721 .0222234 -0.02 0.983 -.0440293 .043085
t_2004_2005 | -.0062025 .0184957 -0.34 0.737 -.0424534 .0300484
t_2005_2006 | .0009606 .0194002 0.05 0.961 -.0370631 .0389843
t_2005_2007 | -.0412939 .0197211 -2.09 0.036 -.0799466 -.0026411
-------------+----------------------------------------------------------------
g2007 |
t_2003_2004 | .0267278 .0140657 1.90 0.057 -.0008404 .054296
t_2004_2005 | -.0045766 .0157178 -0.29 0.771 -.0353828 .0262297
t_2005_2006 | -.0284475 .0181809 -1.56 0.118 -.0640814 .0071864
t_2006_2007 | -.0287814 .016239 -1.77 0.076 -.0606091 .0030464
------------------------------------------------------------------------------
Control: Never Treated

I hope you can help me and I am sorry for bothering you again!
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#169

08 Jul 2022, 20:13

Originally posted by Gabor Mugge View Post

To add: the following error message also appears after calling 'estat, event' with the 'long2' option specified beforehand:

Code:

csdid_event(): 3301 subscript invalid <istmt>: - function returned error

Hi Gabor
Sorry it took this long to answer.
So, I think the problem may be because of a different way I m implementing drimp now. So, there are two options
1) because you have no covariates, you would do better using method(reg).
2) if you add covariates, you could use method(dripw), although you do not have enough observations to add more than 1 or 2 controls to your model.

Hope this helps
F
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#170

08 Jul 2022, 20:18

Hi Leon
I think this has to do with a different way i m estimating drimp. Which is very sensitive, in addition, in earlier versions i wasn't using the correct control groups when using pre-treatment effects and when you had no never-treated observations.

Now, if you are using no controls, you would do better and use method(reg) or method(dripw) if you use controls
HTH
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#171

08 Jul 2022, 20:40

Hi Georgina
When you use "cevent, window(#1 #2)" you are simply getting the weighted average of all ATTGTs between #1 and #2
for example, for the -3 to -1 gets the average of all pretreatment ATTGTs but weighted based on how many observations were used in that ATTGT.
you can see the weights i use if you type "ereturn display"

Now, the difference with pre_avg and post_avg is that those are average that give the same weight to Tm3 Tm2 and Tm1.

Hope this helps
Fernando
Comment
Leon Schmidt

Join Date: Apr 2018

Posts: 98
#172

11 Jul 2022, 02:35

Thanks a lot, Fernando for this explanation!

So just to better understand, the csdid command estimates and averages ATTGTs but the user can choose different methodologies to do so (and traditional OLS is one of them for estimating the ATTGTs)? Is there any reference that formally discusses what you wrote above (use OLS when you have no controls)?

Thanks again!
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#173

11 Jul 2022, 06:19

Hi Leon
Yes, one can choose different methodologies, but OLS (as is usually implemented as a TWFE) is NOT one. The way it does, is kind of estimating a OLS for each state (treated untreated X before and After), and get the DID as usual, using the predicted values.

Formal references, refer to Sant'Anna and Zhao (2020), where he explains all estimators, and you can easily derive that all of them collapse into the standard 2x2 DID when there are no controls
F
1 like
Comment
Doug Hassanali

Join Date: Sep 2018

Posts: 14
#174

16 Jul 2022, 18:00

Hi FernandoRios
I am reachingout to you in-regards to your csdid stata command in implementing staggered treament. My problem is that when I ran the code, get back omitted results (see attched pdf)
I wanted to improve on my analysis using the new developments in DiD aside from my preliminary analysis which used the following code:
reghdfe edyrtotal interactionspost_f5 [pweight= perweight] if byear>1983 & compprimSample==1, abs(ethnicityug religion distrikt birthyear) vce(cl clustervar)

Instead, I run the following code:
csdid edyrtotal interactionspost_f5 if byear>1983 & compprimSample==1, time(birthyear) gvar(first_treat ) method(dripw) reps(20) cluster(clustervar)

Note: data (link) used is a single survey wave treated as an rc with treated individuals born 1990 - 1997 (see first_treat var) and control born 1984 -1989

Do not know what I am getting wrong in my setup and any guidance you might offer is highly appreciated.
Attached Files

toy_results.pdf (58.0 KB, 1 view)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#175

18 Jul 2022, 06:14

Hi Doug
I think the problem is that you are using a setup that is not compatible with csdid.
Basically, with reghdfe, you are including the treatment post interaction, because that calculates the ATT
with csdid, that is the same as using NO controls. By trying to add this, it creates multicolinearity problems causing it to crash, just as you report.
HTH
Comment
Doug Hassanali

Join Date: Sep 2018

Posts: 14
#176

18 Jul 2022, 22:56

Hi FernandoRios
Thanks for the feedback.
How do you reckon I should proceed in setting up the specification to take advantage of CSDID?
Note: my treatment/control indicator var is Post while treatment is non-binary (using an intensity measure - pIntenf5 and is by geographic area).
Treatment itself can be staggered based on cohorts or not (all born after 1989 are eligible for treatment)
Comment
Zara Contractor

Join Date: Jul 2022

Posts: 1
#177

21 Jul 2022, 17:59

Hi Fernando,

I was wondering how the reported ‘number of obs’ is calculated for the csdid command? I need to run additional analysis on the exact sample used in the csdid regression, so after the csdid command I ran ‘keep if e(sample)==1’ ,which keeps exactly as many observations as ‘number of obs’ indicated.

But I noticed that many observations were missing from the resulting sample: in particular, all the in-sample observations were from pre-treatment cohort-years only, though I can see from the output that post-treatment observations were also used. (To confirm this, I re-ran the csdid command only on the observations that were in-sample, and got very different results.)

Would really appreciate any help on this, thanks!

Last edited by Zara Contractor; 21 Jul 2022, 18:03.
Comment
noriko amanop

Join Date: Jul 2022

Posts: 3
#178

25 Jul 2022, 10:19

Hi FernandoRios,

Following up on Zara's question, the attached toy data (in which I use the names of mpdta.dta although it is NOT mpdta.dta) emphasizes the problem she pointed out.

In particular, the code

use "toy_mpdta.dta", clear

csdid lemp lpop lpop_sq, ivar(countyreal) time(year) gvar(first_treat) method(dripw) notyet
* Number of obs = 409 observations

keep if e(sample)==1
* but if we keep the 409 observations,
csdid lemp lpop lpop_sq, ivar(countyreal) time(year) gvar(first_treat) method(dripw) notyet
* Number of obs = 363 and the results change

leads to different estimates. Unlike Zara, I do not see that the in-sample observations were coming ONLY from pre-treatment cohort-years. However, I also need to run additional analysis on the exact sample used in the csdid regression and I don't know how to proceed.

Thank you so much for your thorough responses to all previous queries and any guidance with this would be extremely helpful.
Attached Files

toy_mpdta.dta (129.4 KB, 1 view)

Last edited by noriko amanop; 25 Jul 2022, 10:36.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2467
#179

25 Jul 2022, 11:25

Hi Noriko
Thank you for the replicable example. It took some time but I found the reason for this. Unfortunately, I don't see how to fix it within the code, other than doing additional cross checks in the data (making the code slower)
so here is the problem.
1) you have an extremely unbalanced dataset. Which is normally not a problem, because csdid uses locally balanced data.
2) The problem: Your data is Badly balanced.
What I mean with this is that for many (possible all??) of your observations do not observe your units "WHEN" they were treated

Code:

year unit_id countyreal first_treat lpop lpop_sq lemp 2006 6865 266 1998 5.379897 28.94329 5.990784 2010 550632 266 1998 4.394449 19.31118 18.51852

Consider the case above with countyreal=266. Based on your data this unit was treated in 1998, but you only observe it in 2006 and 2010. Technically, this data is unusable for DID.
You will also see that your data is not panel either. If you try to do "xtset countyreal year" it will give you a warning. CSDID isn't catching that.

So, 1), make your data to be a panel, or treated as repeated crossection
2) make sure that if your panel, your units HAVE to be observed at the year of the treatment, and the year Before treatment. Otherwise, the results will not make sense.

HTH
Fernando
Comment
noriko amanop

Join Date: Jul 2022

Posts: 3
#180

26 Jul 2022, 01:18

Thank you so much, Fernando. This makes a lot of sense.

In my actual data, I do observe all units the year they were treated... However, I may not be able to observe them all the year before...

I will try to think of ways to address this, but I really appreciate your help. Thank you!!
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment