Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • csdid "long gaps"

    An option in csdid is to use "long gaps", which in the documentation is defined as "For periods before treatment, this option requests the estimation of Long gaps, rather than short-gaps."

    This is unclear to me. What does this do, exactly and when should I use this option?

  • #2
    [model answer]
    It seems to me that the answer would be the following:
    Left column: short
    middle column: long
    right column: long + notyet
    Unit of obs: id
    year: 1984-1993
    data source: Frontiers-in-DID/Exercises/Exercise-1 at main · Mixtape-Sessions/Frontiers-in-DID (github.com)
    code: csdid income , ivar(id) gvar(group) time(year) short [/long]
    attachment: full output
    Click image for larger version

Name:	Screenshot 2023-11-24 103410.png
Views:	1
Size:	20.6 KB
ID:	1734957

    For time periods before experiencing treatment, -long- ensures that a treated group is not compared to the comparison group (never treated (default in csdid), at t-1), more than one period before this treated group was treated and back until the first period, as is done in -short-. [left vs. middle column]
    However, after experiencing treatment, -long- also ensures that a treated group is compared to the comparison group (never treated, at t-k) [left vs. middle column]

    Is this a correct explanation? If so, why is there no estimate when 1991 is compared to 1988 for group==1992 (last row)? Likewise, why is 1990(1989) not compared to 1989(1988), for this group?
    Attached Files

    Comment


    • #3
      I think the answer to #1 and #2 is:
      1989 is a missing time period in the data. There is no group=1989, and we also do not know if anyone was treated in 1989 (e.g. they might be dropped from the sample altogether for one reason or another). We also do not have time period year =1989, maybe because data were not collected in this year.
      The output for group=1990 is:
      Click image for larger version

Name:	Screenshot 2023-11-24 123208.png
Views:	1
Size:	16.8 KB
ID:	1734975

      So in summary:

      short: compares a treated group to the untreated (never-treated (+not yet treated)) at t and t-1 for all t <= g(roup) (=first_treat). For all t >= g, the outcome at t >= g is compared to the control at t, and at t = (group-1).
      long: does something similar, only that it no longer compares treated and control at t and t-1 for the pretreatment periods (t < g), but instead compares t = (g-1) (so one period before the first treatment) with each period before that t = (g-k) for all possible values of k.

      both long and short: if there is a missing time period at some t = (g-1), -csdid- cannot use (g-1) as a control group (because there are no observations in this time period). It hence goes on comparing all future time periods t >= g to time period (g-1), assuming that it may be able to compute the difference for any such future period t >= g (because it was not able to do so in the previous attempt).

      Conclusion: If you have missing time periods, you won't obtain any estimate (which you surely would expect - but this explains why you have so many (unsuccessful) Post - (g-1) comparisons.

      Follow-up question:
      If we have "gaps" in the data, but we know that some group was treated during the gap (i.e. 1989 in this case), is this information still useful for us?
      I have replaced the always treated group (1984 - which is anyway excluded from computations in -csdid-) with 1989 (the missing time period), and we get the following output for g =1989:
      Click image for larger version

Name:	Screenshot 2023-11-24 132031.png
Views:	1
Size:	17.7 KB
ID:	1734976

      It turns out yes (I would say), because the aggregated Post_avg and Pre_avg differ (see below, full output attached as "csdid_1989_1990").
      In the case that we do consider that this group was treated in 1989 already (column 1), the results differ from the case that we assume this group was instead first treated in 1990 (column 2) (e.g. because it is the first time we observe this group as being treated). An example of such a case is a pension reform, where we would have biannual data (e.g. HRS), but we know the year in which someone becomes age-eligible for a pension (e.g. because they cross the age-threshold of 60 years).
      Click image for larger version

Name:	Screenshot 2023-11-24 134615.png
Views:	1
Size:	4.3 KB
ID:	1734978

      What is the correct approach to take, in such cases, and how does csdid consider this complication?
      Attached Files

      Comment


      • #4
        Some confusion here
        1. not yet and never treated refers to how the control groups are defined
        2. default and long2 options refer to how the timing of the data is used to estimate pre treatment effects

        for understanding point one. The controls are all units that until Time T have not been treated this includes never treated

        for point 2.
        Pta idea in this case is that before treatment both treated and control units should follow the same trend. Thus if you were to estimate Atts , the effect should be zero. Regardless of what periods you use as long as they are both before treatment

        cs default was using short gaps( did between t-1 and t)
        standard event-studies use long gaps (did t and g-1)

        if you see some estimates skipping is because they isn’t data to estimate them
        hth

        Comment


        • #5
          Thank you Fernando,

          In the follow-up question above at #3, I ask about how individuals should be handled if they were treated during periods in which we have no outcomes.
          Suppose we have outcomes in 2011, 2013, 2015 and 2018, but we know if individuals were treated in 2012, 2014, 2016 or 2017. From the example at #3, I concluded that knowing this is valuable and should be accounted for, even if we have no outcomes in these periods.

          However, if we do not know if they were treated in these unobserved years (2012, 2014, 2016, 2017), the treatment would 'turn on' when we first observe them as being treated, i.e. in 2013, 2015, 2018.
          Hence, I ask: If this scenario applies and we do not have this information, how should we handle these individuals (for whom we do not know if they were treated in the unobserved periods)?
          In this case, there will be no way to identify who is first treated in e.g. 2012 and who is first treated in 2013, so I wonder what would be the 'correct' approach in such cases.

          These two cases give different Pre_avg and Post_avg, as included in the output at #3.

          Comment


          • #6
            Either approach requires it’s own set of assumptions
            And I don’t know which one is more valid.
            I would stick with one and explain why it makes sensr

            Comment

            Working...
            X