csdid "long gaps"

Castor Comploj

Join Date: Mar 2021

Posts: 91
#1

csdid "long gaps"

24 Nov 2023, 02:15

An option in csdid is to use "long gaps", which in the documentation is defined as "For periods before treatment, this option requests the estimation of Long gaps, rather than short-gaps."

This is unclear to me. What does this do, exactly and when should I use this option?
Tags: None

1 like
Castor Comploj

Join Date: Mar 2021

Posts: 91
#2

24 Nov 2023, 02:48

[model answer]
It seems to me that the answer would be the following:
Left column: short
middle column: long
right column: long + notyet
Unit of obs: id
year: 1984-1993
data source: Frontiers-in-DID/Exercises/Exercise-1 at main · Mixtape-Sessions/Frontiers-in-DID (github.com)
code: csdid income , ivar(id) gvar(group) time(year) short [/long]
attachment: full output

For time periods before experiencing treatment, -long- ensures that a treated group is not compared to the comparison group (never treated (default in csdid), at t-1), more than one period before this treated group was treated and back until the first period, as is done in -short-. [left vs. middle column]
However, after experiencing treatment, -long- also ensures that a treated group is compared to the comparison group (never treated, at t-k) [left vs. middle column]

Is this a correct explanation? If so, why is there no estimate when 1991 is compared to 1988 for group==1992 (last row)? Likewise, why is 1990(1989) not compared to 1989(1988), for this group?
Attached Files

csdid_long_short_notyet.smcl (73.6 KB, 1 view)
Comment
Castor Comploj

Join Date: Mar 2021

Posts: 91
#3

24 Nov 2023, 05:59

I think the answer to #1 and #2 is:
1989 is a missing time period in the data. There is no group=1989, and we also do not know if anyone was treated in 1989 (e.g. they might be dropped from the sample altogether for one reason or another). We also do not have time period year =1989, maybe because data were not collected in this year.
The output for group=1990 is:

So in summary:

short: compares a treated group to the untreated (never-treated (+not yet treated)) at t and t-1 for all t <= g(roup) (=first_treat). For all t >= g, the outcome at t >= g is compared to the control at t, and at t = (group-1).
long: does something similar, only that it no longer compares treated and control at t and t-1 for the pretreatment periods (t < g), but instead compares t = (g-1) (so one period before the first treatment) with each period before that t = (g-k) for all possible values of k.

both long and short: if there is a missing time period at some t = (g-1), -csdid- cannot use (g-1) as a control group (because there are no observations in this time period). It hence goes on comparing all future time periods t >= g to time period (g-1), assuming that it may be able to compute the difference for any such future period t >= g (because it was not able to do so in the previous attempt).

Conclusion: If you have missing time periods, you won't obtain any estimate (which you surely would expect - but this explains why you have so many (unsuccessful) Post - (g-1) comparisons.

Follow-up question:
If we have "gaps" in the data, but we know that some group was treated during the gap (i.e. 1989 in this case), is this information still useful for us?
I have replaced the always treated group (1984 - which is anyway excluded from computations in -csdid-) with 1989 (the missing time period), and we get the following output for g =1989:

It turns out yes (I would say), because the aggregated Post_avg and Pre_avg differ (see below, full output attached as "csdid_1989_1990").
In the case that we do consider that this group was treated in 1989 already (column 1), the results differ from the case that we assume this group was instead first treated in 1990 (column 2) (e.g. because it is the first time we observe this group as being treated). An example of such a case is a pension reform, where we would have biannual data (e.g. HRS), but we know the year in which someone becomes age-eligible for a pension (e.g. because they cross the age-threshold of 60 years).

What is the correct approach to take, in such cases, and how does csdid consider this complication?
Attached Files

csdid_1989_1990.txt (2.6 KB, 1 view)
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#4

24 Nov 2023, 08:02

Some confusion here
1. not yet and never treated refers to how the control groups are defined
2. default and long2 options refer to how the timing of the data is used to estimate pre treatment effects

for understanding point one. The controls are all units that until Time T have not been treated this includes never treated

for point 2.
Pta idea in this case is that before treatment both treated and control units should follow the same trend. Thus if you were to estimate Atts , the effect should be zero. Regardless of what periods you use as long as they are both before treatment

cs default was using short gaps( did between t-1 and t)
standard event-studies use long gaps (did t and g-1)

if you see some estimates skipping is because they isn’t data to estimate them
hth
Comment
Castor Comploj

Join Date: Mar 2021

Posts: 91
#5

24 Nov 2023, 09:38

Thank you Fernando,

In the follow-up question above at #3, I ask about how individuals should be handled if they were treated during periods in which we have no outcomes.
Suppose we have outcomes in 2011, 2013, 2015 and 2018, but we know if individuals were treated in 2012, 2014, 2016 or 2017. From the example at #3, I concluded that knowing this is valuable and should be accounted for, even if we have no outcomes in these periods.

However, if we do not know if they were treated in these unobserved years (2012, 2014, 2016, 2017), the treatment would 'turn on' when we first observe them as being treated, i.e. in 2013, 2015, 2018.
Hence, I ask: If this scenario applies and we do not have this information, how should we handle these individuals (for whom we do not know if they were treated in the unobserved periods)?
In this case, there will be no way to identify who is first treated in e.g. 2012 and who is first treated in 2013, so I wonder what would be the 'correct' approach in such cases.

These two cases give different Pre_avg and Post_avg, as included in the output at #3.
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2470
#6

24 Nov 2023, 10:12

Either approach requires it’s own set of assumptions
And I don’t know which one is more valid.
I would stick with one and explain why it makes sensr
Comment

Announcement

Comment

Comment

Comment

Comment

Comment