Hello,
I'm working on a project analyzing the impact of a policy intervention on secondary school enrollment rates for girls in different districts of Gujarat, India (total 26 districts). The data is a repeated cross-section at the individual level, and my objective is to estimate the effect at the district level. My dataset includes the following years: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2018, 2022.
Treatment Groups:
Treated in 2016: 9 districts
Treated in 2018: 13 districts
Never Treated: 4 districts
Now I have created all the relevant variables such as gvar, enroll, mean_enroll ( aggregated by year and gvar), and the output for " tab year gvar also looks fine.
My main question is, why are so many values omitted in the csdid output, and how can I resolve this issue to get the correct estimates? I would really appreciate any help or pointers!
I am attaching the outputs here:
. tab year gvar
year | 0 2016 2018 | Total
------------+--------------------------------+----------
2009 | 182 378 543 | 1,103
2010 | 179 304 568 | 1,051
2011 | 150 328 572 | 1,050
2012 | 120 326 445 | 891
2013 | 143 326 548 | 1,017
2014 | 145 330 525 | 1,000
2016 | 159 326 544 | 1,029
2018 | 193 397 504 | 1,094
2022 | 210 484 630 | 1,324
------------+--------------------------------+----------
Total | 1,481 3,199 4,879 | 9,559
************************************************** **************************************************
. csdid mean_enroll, time(year) gvar(gvar) method(dripw)
.....xxxxxxxx.....xxxxxxxx
Difference-in-difference with Multiple Time Periods
Number of obs = 6,112
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g2016 |
t_2009_2010 | 0 (omitted)
t_2010_2011 | 0 (omitted)
t_2011_2012 | 0 (omitted)
t_2012_2013 | 0 (omitted)
t_2013_2014 | -.0030303 .0030257 -1.00 0.317 -.0089606 .0029
t_2014_2015 | 0 (omitted)
t_2015_2016 | 0 (omitted)
t_2015_2017 | 0 (omitted)
t_2015_2018 | 0 (omitted)
t_2015_2019 | 0 (omitted)
t_2015_2020 | 0 (omitted)
t_2015_2021 | 0 (omitted)
t_2015_2022 | 0 (omitted)
-------------+----------------------------------------------------------------
g2018 |
t_2009_2010 | 0 (omitted)
t_2010_2011 | 0 (omitted)
t_2011_2012 | 0 (omitted)
t_2012_2013 | 0 (omitted)
t_2013_2014 | -.0038095 .0026886 -1.42 0.157 -.0090791 .00146
t_2014_2015 | 0 (omitted)
t_2015_2016 | 0 (omitted)
t_2016_2017 | 0 (omitted)
t_2017_2018 | 0 (omitted)
t_2017_2019 | 0 (omitted)
t_2017_2020 | 0 (omitted)
t_2017_2021 | 0 (omitted)
t_2017_2022 | 0 (omitted)
------------------------------------------------------------------------------
Control: Never Treated
See Callaway and Sant'Anna (2021) for details
I'm working on a project analyzing the impact of a policy intervention on secondary school enrollment rates for girls in different districts of Gujarat, India (total 26 districts). The data is a repeated cross-section at the individual level, and my objective is to estimate the effect at the district level. My dataset includes the following years: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2018, 2022.
Treatment Groups:
Treated in 2016: 9 districts
Treated in 2018: 13 districts
Never Treated: 4 districts
Now I have created all the relevant variables such as gvar, enroll, mean_enroll ( aggregated by year and gvar), and the output for " tab year gvar also looks fine.
My main question is, why are so many values omitted in the csdid output, and how can I resolve this issue to get the correct estimates? I would really appreciate any help or pointers!
I am attaching the outputs here:
. tab year gvar
year | 0 2016 2018 | Total
------------+--------------------------------+----------
2009 | 182 378 543 | 1,103
2010 | 179 304 568 | 1,051
2011 | 150 328 572 | 1,050
2012 | 120 326 445 | 891
2013 | 143 326 548 | 1,017
2014 | 145 330 525 | 1,000
2016 | 159 326 544 | 1,029
2018 | 193 397 504 | 1,094
2022 | 210 484 630 | 1,324
------------+--------------------------------+----------
Total | 1,481 3,199 4,879 | 9,559
************************************************** **************************************************
. csdid mean_enroll, time(year) gvar(gvar) method(dripw)
.....xxxxxxxx.....xxxxxxxx
Difference-in-difference with Multiple Time Periods
Number of obs = 6,112
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g2016 |
t_2009_2010 | 0 (omitted)
t_2010_2011 | 0 (omitted)
t_2011_2012 | 0 (omitted)
t_2012_2013 | 0 (omitted)
t_2013_2014 | -.0030303 .0030257 -1.00 0.317 -.0089606 .0029
t_2014_2015 | 0 (omitted)
t_2015_2016 | 0 (omitted)
t_2015_2017 | 0 (omitted)
t_2015_2018 | 0 (omitted)
t_2015_2019 | 0 (omitted)
t_2015_2020 | 0 (omitted)
t_2015_2021 | 0 (omitted)
t_2015_2022 | 0 (omitted)
-------------+----------------------------------------------------------------
g2018 |
t_2009_2010 | 0 (omitted)
t_2010_2011 | 0 (omitted)
t_2011_2012 | 0 (omitted)
t_2012_2013 | 0 (omitted)
t_2013_2014 | -.0038095 .0026886 -1.42 0.157 -.0090791 .00146
t_2014_2015 | 0 (omitted)
t_2015_2016 | 0 (omitted)
t_2016_2017 | 0 (omitted)
t_2017_2018 | 0 (omitted)
t_2017_2019 | 0 (omitted)
t_2017_2020 | 0 (omitted)
t_2017_2021 | 0 (omitted)
t_2017_2022 | 0 (omitted)
------------------------------------------------------------------------------
Control: Never Treated
See Callaway and Sant'Anna (2021) for details
Comment