Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Staggered Difference-in-Difference and Parallel Trends

    Hi,

    My data has the following setup. I have birth level data for two survey rounds- 2015-16 and 2019-2021. The treatment happened at the district level in three phases (for example, the Total districts are 600, 300 districts get treatment in Phase1 from Dec2017 to Apr2018, 200 districts get treatment in Phase2 from May 2018 to Dec2018 and 100 districts get treatment in Phase3 from Jan2019 to Dec2019). By Jan2020, all districts got treatment. I want to run a Staggered DID and compare differences in outcomes of children in pre and post-treatment periods.

    Can I run the following regression for the same?

    Y_idt = α + βTreat_dt + Zs + X_idt + θt + εist,
    i – child, d – district, t – birth year
    Treat_dt – switches from 0 to 1 if district d had the treatment by time t

    Can I use csdid in this case?

    Also, in this setup, how do I check for the parallel trends assumption? I am confused about the stata codes in this case.

    Please help as I am new to this staggered DID literature.

    Thanks in advance.

  • #2
    It looks like the treatments mostly occurred during the window between 16-19, so staggered DID is probably not useful. Your data is basically broken down into pre- and post-treatment (with the exception of the last group, which you could exclude). You could do a generalized DID where you have time since treatment, which would be easier to do than staggered DID (treat dummy, timesincetreat*treat). You might include the 2020 treated, since 2/3 of years are treated.

    Comment


    • #3
      Thank you so much for your response, and sorry for the late reply. I want to confirm if we can check for parallel trends in this case. If yes, then how can we do that?
      If not, then how can I control for any pre-trends?

      Comment


      • #4
        The data looks like this, I think. Three groups G1, G2, G3, and 5 periods.

        1 | 2 | 3 | 4 | 5
        Observe Y | treat_G1 treat_G2 treat_G3 | Observe Y
        | DON'T OBSERVE Y |

        All your obs are treated by the end. There is no untreated group. You have diff-in-diff across treatment timing only. No obvious approach to PP analysis.

        Interesting problem.



        Comment


        • #5
          Sorry if I wasn't clear earlier. I should have given the details properly.
          My data looks like this-
          Unit ID (i) District category (d) Year(t) Treat
          001 1 2015 0
          002 1 2016 0
          003 1 2017 1
          004 2 2015 0
          005 2 2018 1
          006 2 2016 0
          007 3 2017 0
          008 3 2019 1
          009 3 2018 0
          010 3 2020 1
          011 1 2019 1
          012 1 2018 1
          013 2 2021 1
          014 2 2019 1
          015 3 2016 0
          016 3 2015 0
          Here unit ID is the child's id. I created a district category which is the category assigned to a district according to the phase(Phases I mentioned earlier) in which that district got treatment, and the year variable is the year of birth, and Treat is a dummy variable created which takes the value 1 if district d got treated by time t, and 0 otherwise.

          I understood your point that I have diff-in-diff across treatment timing only as there is no untreated group and I can use generalised DID in this case but would that analysis be relevant in my case? I want to check if the child's outcomes are better after the treatment compared to before the treatment in early-treated districts versus late-treated districts.

          Comment


          • #6
            It's not a panel in UnitID. you basically have a panel of 3 districts and 8 years. everyone in a district gets the treatment at a specific year. is there any other way to group the data (county, state, etc...)?

            Comment


            • #7
              Yes, it's not a panel in UnitID. I have pooled the data by appending the two survey rounds of DHS. I have 640 districts and 7 years in my dataset, and the total number of observations is around 700000. I mentioned the 3 district categories according to the 3 phases in which treatment was given to a district.

              Comment


              • #8
                try collapsing by to the 640 districts. Then you have a panel.

                Comment


                • #9
                  Thank you for your quick responses.
                  If I do collapse, then my analysis would be at the district level whereas I want to do individual level analysis. My outcome variable is at the individual level. I want to know how can we use csdid (Callaway Sant Anna's estimator) in repeated cross-sections.

                  Comment


                  • #10
                    Thank you for your quick responses.
                    If I do collapse, then my analysis would be at the district level whereas I want to do individual level analysis. My outcome variable is at the individual level. I want to know how can we use csdid (Callaway Sant Anna's estimator) in repeated cross-sections.

                    Comment

                    Working...
                    X