Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Omitted Values in csdid Analysis for Repeated Cross-Sectional Data

    Hello,

    I'm working on a project analyzing the impact of a policy intervention on secondary school enrollment rates for girls in different districts of Gujarat, India (total 26 districts). The data is a repeated cross-section at the individual level, and my objective is to estimate the effect at the district level. My dataset includes the following years: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2018, 2022.

    Treatment Groups:
    Treated in 2016: 9 districts
    Treated in 2018: 13 districts
    Never Treated: 4 districts

    Now I have created all the relevant variables such as gvar, enroll, mean_enroll ( aggregated by year and gvar), and the output for " tab year gvar also looks fine.

    My main question is, why are so many values omitted in the csdid output, and how can I resolve this issue to get the correct estimates? I would really appreciate any help or pointers!

    I am attaching the outputs here:

    . tab year gvar

    year | 0 2016 2018 | Total
    ------------+--------------------------------+----------
    2009 | 182 378 543 | 1,103
    2010 | 179 304 568 | 1,051
    2011 | 150 328 572 | 1,050
    2012 | 120 326 445 | 891
    2013 | 143 326 548 | 1,017
    2014 | 145 330 525 | 1,000
    2016 | 159 326 544 | 1,029
    2018 | 193 397 504 | 1,094
    2022 | 210 484 630 | 1,324
    ------------+--------------------------------+----------
    Total | 1,481 3,199 4,879 | 9,559

    ************************************************** **************************************************
    . csdid mean_enroll, time(year) gvar(gvar) method(dripw)
    .....xxxxxxxx.....xxxxxxxx
    Difference-in-difference with Multiple Time Periods

    Number of obs = 6,112
    Outcome model : least squares
    Treatment model: inverse probability
    ------------------------------------------------------------------------------
    | Coefficient Std. err. z P>|z| [95% conf. interval]
    -------------+----------------------------------------------------------------
    g2016 |
    t_2009_2010 | 0 (omitted)
    t_2010_2011 | 0 (omitted)
    t_2011_2012 | 0 (omitted)
    t_2012_2013 | 0 (omitted)
    t_2013_2014 | -.0030303 .0030257 -1.00 0.317 -.0089606 .0029
    t_2014_2015 | 0 (omitted)
    t_2015_2016 | 0 (omitted)
    t_2015_2017 | 0 (omitted)
    t_2015_2018 | 0 (omitted)
    t_2015_2019 | 0 (omitted)
    t_2015_2020 | 0 (omitted)
    t_2015_2021 | 0 (omitted)
    t_2015_2022 | 0 (omitted)
    -------------+----------------------------------------------------------------
    g2018 |
    t_2009_2010 | 0 (omitted)
    t_2010_2011 | 0 (omitted)
    t_2011_2012 | 0 (omitted)
    t_2012_2013 | 0 (omitted)
    t_2013_2014 | -.0038095 .0026886 -1.42 0.157 -.0090791 .00146
    t_2014_2015 | 0 (omitted)
    t_2015_2016 | 0 (omitted)
    t_2016_2017 | 0 (omitted)
    t_2017_2018 | 0 (omitted)
    t_2017_2019 | 0 (omitted)
    t_2017_2020 | 0 (omitted)
    t_2017_2021 | 0 (omitted)
    t_2017_2022 | 0 (omitted)
    ------------------------------------------------------------------------------
    Control: Never Treated

    See Callaway and Sant'Anna (2021) for details
    Last edited by Kanika Phogat; 12 Jul 2024, 06:54.

  • #2
    Hi Kanika
    1) use method(regress) or nothing at all
    2) Make sure you have last csdid drdid versions
    3) make sure your dependent variable is always observed.

    If after checking this, problem persist, please send me an email with the data for trying to figure this out.

    Comment


    • #3
      Thank you so much, Fernando. I have just emailed you!

      Comment


      • #4
        I just realized
        the problem happens becUse your pre treatment period is not on the data.
        This happens because the minimum year change is assumed to be 1. So before treatment there is nothing
        you may need to recode gvar

        Comment


        • #5
          Hi Fernando, thank you so much for your reply. If I understand correctly, gvar is not coded properly because I have data missing for 2015, 2017, 2019, 2020, and 2021.

          Also, after browsing through my data and comparing it to the dataset from the help file, I am still unsure how to correctly recode gvar. I see that I have data for each year, and the values of year and gvar seem to be in line with the analysis.

          Could you please provide some instructions or an example of how to appropriately adjust gvar, given the gaps in my data? Thank you again!

          Comment


          • #6
            Gvar May be correctly coded ( truly reflecting the year of first treatment)
            but to do Did you need both post and pretreatment data.
            for CS you actually need the period before treatment to be available.
            in your data you seem to have units treated in 2016, but do not observe them in 2015. So you can’t estimate the treatment effect using t-1 as base year.
            you either need to change your definition of gvar (change 2016 to 2015) or get better data (no missing periods)

            Comment


            • #7
              Ah, I understand the issue now. Thank you so much for the explanation, Fernando. I really appreciate it!!

              Comment

              Working...
              X