Omitted Values in csdid Analysis for Repeated Cross-Sectional Data

Kanika Phogat

Join Date: Jul 2024

Posts: 9
#1

Omitted Values in csdid Analysis for Repeated Cross-Sectional Data

12 Jul 2024, 06:50

Hello,

I'm working on a project analyzing the impact of a policy intervention on secondary school enrollment rates for girls in different districts of Gujarat, India (total 26 districts). The data is a repeated cross-section at the individual level, and my objective is to estimate the effect at the district level. My dataset includes the following years: 2009, 2010, 2011, 2012, 2013, 2014, 2016, 2018, 2022.

Treatment Groups:
Treated in 2016: 9 districts
Treated in 2018: 13 districts
Never Treated: 4 districts

Now I have created all the relevant variables such as gvar, enroll, mean_enroll ( aggregated by year and gvar), and the output for " tab year gvar also looks fine.

My main question is, why are so many values omitted in the csdid output, and how can I resolve this issue to get the correct estimates? I would really appreciate any help or pointers!

I am attaching the outputs here:

. tab year gvar

year | 0 2016 2018 | Total
------------+--------------------------------+----------
2009 | 182 378 543 | 1,103
2010 | 179 304 568 | 1,051
2011 | 150 328 572 | 1,050
2012 | 120 326 445 | 891
2013 | 143 326 548 | 1,017
2014 | 145 330 525 | 1,000
2016 | 159 326 544 | 1,029
2018 | 193 397 504 | 1,094
2022 | 210 484 630 | 1,324
------------+--------------------------------+----------
Total | 1,481 3,199 4,879 | 9,559

************************************************** **************************************************
. csdid mean_enroll, time(year) gvar(gvar) method(dripw)
.....xxxxxxxx.....xxxxxxxx
Difference-in-difference with Multiple Time Periods

Number of obs = 6,112
Outcome model : least squares
Treatment model: inverse probability
------------------------------------------------------------------------------
| Coefficient Std. err. z P>|z| [95% conf. interval]
-------------+----------------------------------------------------------------
g2016 |
t_2009_2010 | 0 (omitted)
t_2010_2011 | 0 (omitted)
t_2011_2012 | 0 (omitted)
t_2012_2013 | 0 (omitted)
t_2013_2014 | -.0030303 .0030257 -1.00 0.317 -.0089606 .0029
t_2014_2015 | 0 (omitted)
t_2015_2016 | 0 (omitted)
t_2015_2017 | 0 (omitted)
t_2015_2018 | 0 (omitted)
t_2015_2019 | 0 (omitted)
t_2015_2020 | 0 (omitted)
t_2015_2021 | 0 (omitted)
t_2015_2022 | 0 (omitted)
-------------+----------------------------------------------------------------
g2018 |
t_2009_2010 | 0 (omitted)
t_2010_2011 | 0 (omitted)
t_2011_2012 | 0 (omitted)
t_2012_2013 | 0 (omitted)
t_2013_2014 | -.0038095 .0026886 -1.42 0.157 -.0090791 .00146
t_2014_2015 | 0 (omitted)
t_2015_2016 | 0 (omitted)
t_2016_2017 | 0 (omitted)
t_2017_2018 | 0 (omitted)
t_2017_2019 | 0 (omitted)
t_2017_2020 | 0 (omitted)
t_2017_2021 | 0 (omitted)
t_2017_2022 | 0 (omitted)
------------------------------------------------------------------------------
Control: Never Treated

See Callaway and Sant'Anna (2021) for details

Last edited by Kanika Phogat; 12 Jul 2024, 06:54.
Tags: csdid, staggered did
FernandoRios

Join Date: Apr 2014

Posts: 2459
#2

12 Jul 2024, 09:42

Hi Kanika
1) use method(regress) or nothing at all
2) Make sure you have last csdid drdid versions
3) make sure your dependent variable is always observed.

If after checking this, problem persist, please send me an email with the data for trying to figure this out.
1 like
Comment
Kanika Phogat

Join Date: Jul 2024

Posts: 9
#3

14 Jul 2024, 05:10

Thank you so much, Fernando. I have just emailed you!
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2459
#4

14 Jul 2024, 07:00

I just realized
the problem happens becUse your pre treatment period is not on the data.
This happens because the minimum year change is assumed to be 1. So before treatment there is nothing
you may need to recode gvar
Comment
Kanika Phogat

Join Date: Jul 2024

Posts: 9
#5

14 Jul 2024, 10:52

Hi Fernando, thank you so much for your reply. If I understand correctly, gvar is not coded properly because I have data missing for 2015, 2017, 2019, 2020, and 2021.

Also, after browsing through my data and comparing it to the dataset from the help file, I am still unsure how to correctly recode gvar. I see that I have data for each year, and the values of year and gvar seem to be in line with the analysis.

Could you please provide some instructions or an example of how to appropriately adjust gvar, given the gaps in my data? Thank you again!
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2459
#6

14 Jul 2024, 15:39

Gvar May be correctly coded ( truly reflecting the year of first treatment)
but to do Did you need both post and pretreatment data.
for CS you actually need the period before treatment to be available.
in your data you seem to have units treated in 2016, but do not observe them in 2015. So you can’t estimate the treatment effect using t-1 as base year.
you either need to change your definition of gvar (change 2016 to 2015) or get better data (no missing periods)
1 like
Comment
Kanika Phogat

Join Date: Jul 2024

Posts: 9
#7

14 Jul 2024, 17:39

Ah, I understand the issue now. Thank you so much for the explanation, Fernando. I really appreciate it!!
Comment

Announcement

Omitted Values in csdid Analysis for Repeated Cross-Sectional Data

Comment

Comment

Comment

Comment

Comment

Comment