Diff in Diff: DRDID and CSDID

FernandoRios

Join Date: Apr 2014

Posts: 2487
#106

21 Apr 2022, 08:06

One more question. Since its not on the dataset you pointout. How are you defining your GVAR. there isn't one named year_first_fdl. And no idea what is your "treatment" definition here
Comment
Alex Arasanz

Join Date: Apr 2022

Posts: 7
#107

21 Apr 2022, 08:32

The treatment is having a first female democratic leader; takes 1 if country i had a first female democratic leader (FDL) in year t or in any year prior t. This variable is not in the code line of the regression (I understood it shouldn't be there).

The gvar is coded as follows:
gen year_First_FDL=.
replace year_First_FDL=1979 if country_name== "United Kingdom" & year >= 1979. //Cohort 1979
.
.
.
replace year_First_FDL=0 if year_First_FDL==. //Not treated
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2487
#108

21 Apr 2022, 10:30

That is the problem.
For United Kingdom, Gvar =1979 always. Not 0 ( as your code suggests.
Comment
Alex Arasanz

Join Date: Apr 2022

Posts: 7
#109

21 Apr 2022, 11:01

Indeed! Thank you very much!
Comment
John lenon

Join Date: Apr 2022

Posts: 9
#110

24 Apr 2022, 13:39

"estimates post: matrix has missing values"

I get this error sometimes for event study or group ATT when I include some covariates combinations. How can I tackle this issue without deleting some important covariates from the regression?
Thank you
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2487
#111

25 Apr 2022, 03:50

First
try getting the last update. It should have fixed that
second. Choose the smallest 2x2 design and use that to play with specification using ols or logit
hth
Comment
Annie McGrew

Join Date: Apr 2022

Posts: 2
#112

25 Apr 2022, 22:23

Originally posted by FernandoRios View Post

No you cant, and there is no need.
Time fixed effects are used to "take care of differences across time". But with Callaway and Sant'Anna , you use the same years (pre and post) for the treated and control group, to obtain a given estimator. So there is no need to control for that.
Also, keep in mind that everytime drdid is used (behind the all operations), you are only using 2 periods of time, thus using trends would make little sense.
HTH
Fernando

Hi Fernando,

I have a follow-up question about this and a few other questions. I am estimating the following regression model:

xtreg pctfem audit_treat estab_size firm_size i.year i.naics#i.year i.district_code_num#i.year, fe r

I have a panel dataset of firms over time. I have four different treatment years in 2012, 2013, 2014 and 2015. My data is from 2010-2016. In this initial panel regression, I am controlling for establishment size, firm size (both of which change over time), as well as including year fixed effects, industry by year fixed effects and district by year fixed effects. I was able to estimate this regression using the Sun & Abraham estimator for staggered diff-in-diff using the following code:

eventstudyinteract pctfem audit_treat, ///
cohort(audit_fyear) control_cohort(never_treated) covariates(estab_size firm_size) ///
absorb(pid year i.district_code_num#i.year i.naics#i.year) vce(cluster pid)

where audit_fyear is the year of treatment (with never treated units missing) and never_treated = 0 for never-treated units and 1 for ever-treated units. I decided instead to estimate this model using csdid for two reasons: 1) I think that the Sun & Abraham method is only for estimating event studies and 2) because my outcome pctfem is taken into consideration for treatment assignment (audit_treat). Thus, firms with low values of pctfem may have been selected for treatment. Because of this, I wanted to implement inverse probability score reweighting and doubly robust methods to help balance my treated cohort and untreated cohort (which csdid allows you to do). I have run into two problems with implementing csdid:

1) as you mentioned in your previous post, you cannot include fixed effects for year and unit (here my firm identifier is pid). However, it seems i am also unable to use district by year and industry by year fixed effects. is that true? is there no option such as absorb in reghdfe to include multiple fixed effects? Maybe it is not necessary in this estimator to use any fixed effects and I am just not understanding...

2) I am getting almost all omitted coefficients when I run the following regression without the fixed effects:

csdid pctfem estab_size firm_size, ivar(pid) time(year) gvar(audit_fyear) method(dripw)

here audit_fyear = 0 for never-treated units and equals the year of treatment for every-treated units.

However, if I run the default csdid estimator (code below) I get results for all my coefficients.

csdid pctfem estab_size firm_size, ivar(pid) time(year) gvar(audit_fyear)

For both estimations I get the following error code in red before results are displayed: "
"Panel is not balanced. Will use observations with Pair balanced (observed at t0 and t1)"

I think for this second problem, something is going wrong with my inverse probability weights. I have also tried to estimate inverse probability weights using teffects and by hand using probit but my estimates are not converging.. I am not sure if this is happening because teffects and probit may not work with panel data. However, if I change my covariates from estab_size and firm_size to pctexec and pctmanag everything works. It is very strange... I am not sure how to explain it.

Finally, my last question is whether for the ipw estimator, the probability weights are being predicted by the covariates only in the pretreatment years or in all the years? If it is just in the pre-treatment year, is it possible to use my pre-treatment outcome to predict treatment?

Unfortunately, this is a restricted dataset so I cannot share the data file, but if you need any additional information, please let me know. Thanks in advance for any thoughts you might have!

Best,
Annie
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2487
#113

26 Apr 2022, 06:58

Hi Annie,

1) as you mentioned in your previous post, you cannot include fixed effects for year and unit (here my firm identifier is pid). However, it seems i am also unable to use district by year and industry by year fixed effects. is that true? is there no option such as absorb in reghdfe to include multiple fixed effects? Maybe it is not necessary in this estimator to use any fixed effects and I am just not understanding...

What you say is true. But lets back up a bit.
1) csdid does not allow you to explicitly include year and individual fixed effects because the way it works it automatically includes that information in the specification.
2) Specifically, a different regression is run for each cohort (gvar including gvar=0) and year (time var). This is implicitly similar to running the following:
reghdfe y c.(x1 x2 x3) i.gvar#i.time (i.gvar#i.time)#c.(x1 x2 x3), abs(pid)
which implies FULL interaction or heterogeneity as Wooldridge would say.
3) The only other kind of "fixed" effects you may want to consider could be one that "groups" PID. For example, if your panel is of firms, and your "group" is industry, and there are firms within industry that were treated and not treated, then you could add those fixed effects into the model, using "i.". The problem there is sample size. If your 2x2 has , say 8 observations, adding industry FE will not probably produce weird results.

Code:

2) I am getting almost all omitted coefficients when I run the following regression without the fixed effects: csdid pctfem estab_size firm_size, ivar(pid) time(year) gvar(audit_fyear) method(dripw) here audit_fyear = 0 for never-treated units and equals the year of treatment for every-treated units. However, if I run the default csdid estimator (code below) I get results for all my coefficients. csdid pctfem estab_size firm_size, ivar(pid) time(year) gvar(audit_fyear)

My guess here is that you do not really have much variation based on gvar and time. So trying to do a doubly robust model is tough, specifically the logit step of dripw is having problems converging.
Somewhat surprised the other works, since it would normally get similar errors. Hard to say more without seeing the data.

Code:

For both estimations I get the following error code in red before results are displayed: " "Panel is not balanced. Will use observations with Pair balanced (observed at t0 and t1)" I think for this second problem, something is going wrong with my inverse probability weights. I have also tried to estimate inverse probability weights using teffects and by hand using probit but my estimates are not converging.. I am not sure if this is happening because teffects and probit may not work with panel data. However, if I change my covariates from estab_size and firm_size to pctexec and pctmanag everything works. It is very strange... I am not sure how to explain it.

The problem here is a misunderstanding on the samples used. If you would like to replicate CS by hand you need to start doing something like the following:
First TAB time gvar
this will give you a general idea of the sample size you are dealing with. Take any 2 periods (time) and 2 gvar groups (0 and other one), That is the sample you are working with behind the scenes when using csdid.
If your model does not work there, then you need to rethink your model specification.

Code:

Finally, my last question is whether for the ipw estimator, the probability weights are being predicted by the covariates only in the pretreatment years or in all the years? If it is just in the pre-treatment year, is it possible to use my pre-treatment outcome to predict treatment?

Yes its possible, to use pre-treatment outcome to predict treatment and the scores (and ipw) whether it is valid is a different question. Incidentally, that is what CS do in their example.
regarding the ipw. It uses only pretreatment data, but not ALL pretreament years.
For example, if you are focused on , say , ATT(G,T) and T>=G, the pretreatment covariates come from T=G-1. But if T<G, then the pretreatment covariates come from T=T-1 (for the default option). So the definition of pretreatment covariates depends on G and T.

HTH
Comment
John lenon

Join Date: Apr 2022

Posts: 9
#114

26 Apr 2022, 16:26

Originally posted by FernandoRios View Post

First
try getting the last update. It should have fixed that
second. Choose the smallest 2x2 design and use that to play with specification using ols or logit
hth

Do you mean the ado files on your github? I am using them I suppose. Can you provide me the link in case I am using the wrong file.

https://friosavila.github.io/playing...ain_csdid.html

I have few more questions.

1) CS paper uses conditional parallel trends, which means the parallel trend are valid if I control for covariates. Is that right?

2) CS model does not allow treatment start at zero. Does CS model also neglect the observation which are treated at the end of the period? I have control group at the end of the period too. But I think the model is neglecting the states which are treated at the end of the period.

3) There are some observations lost too when I add covariates. Is it just the model and covariates used causing the 2*2 model to lose some observations? Or is there a way to fix it so I don't lose more observations?

Thank you
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2487
#115

26 Apr 2022, 20:14

Sorry, I meant the last ones from SSC.

For the other
1) CS allows you to relax the unconditional parallel trends assumption to a conditional PTA. Parallel trends may not be valid even if you use controls
2) That is by design. G>>0, because I use 0 to denote the never treated. It shouldn't dropped those treated at the last period. I would need to see more of the data to understand theproblem
3) I will need to know more. Not enough information to say why adding covariates in your cases is causing observations to be dropped.
HTH
Comment
Samuel Nocito

Join Date: Apr 2020

Posts: 12
#116

02 May 2022, 01:40

Dear FernandoRios,

I updated the command csdid as you suggested in order to use the new option "long2" for universal base event studies. However, now I am experiencing an anomaly when I use the option "agg(simple)". Basically, when I run:

Code:

csdid y, ivar(id) time(t) gvar(cohort) notyet cluster(cls) agg(simple) estat simple

I get two different coefficient estimates: one from csdid and a different one from the "estat" command. Shouldn't be the same?
Thank you very much in advance for your help!
Best,
Samuel
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2487
#117

02 May 2022, 03:53

They should
I ll check that out
but use the ones from estat simple
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2487
#118

02 May 2022, 04:41

After a quick check i found no difference. Can you prepare a replicable example so I can examine it?
Thank you
Comment
Samuel Nocito

Join Date: Apr 2020

Posts: 12
#119

03 May 2022, 08:35

Dear FernandoRios,

thank you very much for your fast reply and your check. I didn't have much time to prepare a replicable example so far, sorry. However, I guess this might be something related to the dichotomic nature of my outcome and covariate.
I'll try to figure it out and in case I'll prepare a replicable example. Thank you very much agai for your help!
Comment
Annie McGrew

Join Date: Apr 2022

Posts: 2
#120

05 May 2022, 15:01

Hi Fernando -- thank you so much for your response, this has been really helpful. I have a few follow-up questions.

on adding fixed effects:

1) csdid does not allow you to explicitly include year and individual fixed effects because the way it works it automatically includes that information in the specification.
2) Specifically, a different regression is run for each cohort (gvar including gvar=0) and year (time var). This is implicitly similar to running the following:
reghdfe y c.(x1 x2 x3) i.gvar#i.time (i.gvar#i.time)#c.(x1 x2 x3), abs(pid)
which implies FULL interaction or heterogeneity as Wooldridge would say.
3) The only other kind of "fixed" effects you may want to consider could be one that "groups" PID. For example, if your panel is of firms, and your "group" is industry, and there are firms within industry that were treated and not treated, then you could add those fixed effects into the model, using "i.". The problem there is sample size. If your 2x2 has , say 8 observations, adding industry FE will not probably produce weird results.

I guess my question is here that if I want to estimate the following equation using csdid what would be the right specification?

Code:

xtreg `var' audit_treat estab_size firm_size pcths i.year i.naics#i.year i.district_code_num#i.year, fe r

Would it be this specification?:

Code:

csdid `var' estab_size firm_size pcths i.naics i.district_code_num, ivar(pid) time(year) gvar(audit_fyear_csdid) agg(simple)

So I guess more specifically I am asking if i.naics#i.year in a regular panel regression is the same as adding just i.naics in the csdid estimation.

If so, I when I add in i.district_code_num, the code runs fine. however, when I add in i.naics, it displays an error code that says "estimates post: matrix has missing values." This happens even after I drop all observations with missing values for naics. What does this mean?

on using pre-treatment outcomes for IPW

Yes its possible, to use pre-treatment outcome to predict treatment and the scores (and ipw) whether it is valid is a different question. Incidentally, that is what CS do in their example.
regarding the ipw. It uses only pretreatment data, but not ALL pretreament years.
For example, if you are focused on , say , ATT(G,T) and T>=G, the pretreatment covariates come from T=G-1. But if T<G, then the pretreatment covariates come from T=T-1 (for the default option). So the definition of pretreatment covariates depends on G and T.

How do I tell the csdid estimator to use the pre-treatment outcome for the propensity score estimation? it seems like the csdid uses the same X variables in both the propensity score estimation and in the following regression. Obviously, i do not want my pre-treatment outcome to be in the following regression, I only want to use it as part of the probability weighting. Is is possible to do that?

on event study coefficients:
My final question is about the event study coefficients. I am confused about why there is not an omitted category. for example, for my event study using TWFE I omit T = 0, the year of treatment. I saw another post on statalist about this where you said the following:

CSDID do a different identification of the event studies.
If you are looking at periods AFTER treatment, the effect is measure as:
E(DY|t)-E(DY|g-1) (or as you say, using the last period before first treatment)
But for periods before treatment it does
E(DY|t)-E(DY|t-1)

There is no explicit base line or omitted category.

Could you elaborate on this a bit? I am still confused.

Thanks so much!
Annie

Last edited by Annie McGrew; 05 May 2022, 15:06.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment