csdid: Defining never treated individuals

Frida Luise

Join Date: Jun 2025

Posts: 2
#1

csdid: Defining never treated individuals

02 Jun 2025, 04:39

Dear all,

I am estimating the effect of a school reform on labor market outcomes using panel data. Since the reform (that cut one year of high school education) was implemented in a staggered way across different federal states, I decided to use csdid.

Now I am facing problems with defining the time () and gvar () variable to run the command properly.

I ran csdid with ivar(person_id), time(grad_year) and gvar(g8_grad_year). The variable "grad_year" describes the year in which a person graduated, "g8_grad_year" is defined as the first year in which students graduated with one year less schooling, indicating the start of the treatment in every federal state.

My code therefore looks like this:

Code:

gen gym_start_year = birthdate + 10 // defining the year in which kids transition to secondary school gen g8_grad_year = . replace g8_grad_year = 2011 if school_fedstate == 8 & gym_start_year >= 2003 replace g8_grad_year = 2011 if school_fedstate == 9 & gym_start_year >= 2003 replace g8_grad_year = 2012 if school_fedstate == 11 & gym_start_year >= 2004 replace g8_grad_year = 2012 if school_fedstate == 12 & gym_start_year >= 2006 replace g8_grad_year = 2011 if school_fedstate == 4 & gym_start_year >= 2003 replace g8_grad_year = 2010 if school_fedstate == 2 & gym_start_year >= 2002 replace g8_grad_year = 2011 if school_fedstate == 13 & gym_start_year >= 2003 replace g8_grad_year = 2011 if school_fedstate == 3 & gym_start_year >= 2003 replace g8_grad_year = 2012 if school_fedstate == 5 & gym_start_year >= 2004 replace g8_grad_year = 2009 if school_fedstate == 10 & gym_start_year >= 1999 replace g8_grad_year = 2010 if school_fedstate == 15 & gym_start_year >= 2001 replace g8_grad_year = 2014 if school_fedstate == 1 & gym_start_year >= 2006 csdid income controls, ivar(person_id) time(grad_year) gvar(g8_grad_year) method(dripw)

Stata is now constantly giving me the following error

No never treated observations found. Using Not yet treated data
Units always treated found. These will be ignored. Panel is not balanced.
Will use observations with Pair balanced (observed at t0 and t1)

I do have a lot of never treated individuals since my panel includes plenty of older individuals that graduated way before the reform was implemented.

But somehow Stata does not use them as a control group, even though they should be identified through my time variable (if individual_graduation_year < first_g8_grads_in_fed_state). Does anyone know why this is happening?

I am very grateful for any thoughts on this issue!

Best,
Frida
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30170
#2

02 Jun 2025, 12:48

From the help file for -csdid-:

gvar(varname) Variable identifying treatment groups or cohorts. Groups that are never treated should be coded as Zero. Any positive value indicates which year a group was initially treated. And once a group is treated, the underlying assumption is that it always remains treated. [emphasis added]
Comment
Frida Luise

Join Date: Jun 2025

Posts: 2
#3

03 Jun 2025, 03:31

Originally posted by Clyde Schechter View Post

From the help file for -csdid-:

Dear Clyde,

Thank you very much for your reply! I did code never-treated groups as zero but I am still getting the following error code:

Code:

Panel is not balanced Will use observations with Pair balanced (observed at t0 and t1)

So for each individual, csdid needs observations from before and after the treatment. My dataset contains multiple observations per person, but since labor market outcomes are only measured after school, the relevant observations are only post-treatment.

Both pre- and post-treatment data I only have on a state level.

Do you know if there is a way I can still use the csdid approach? Am I missing an obvious point here?

Thank you very much!

Best,
Frida
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2491
#4

03 Jun 2025, 04:15

Can you do two things
first run a simple regression with all your variables of interest. Order doesn’t matter
then gen sample= e(sample)

then.
tab year gvar if sample==1
Then show me that
Comment
Frida Luise

Join Date: Jun 2025

Posts: 2
#5

03 Jun 2025, 07:29

Thank you for your help, Fernando!

This is the result:

Code:

| g8_grad_year grad_year | 2010 2011 2012 | Total -----------+---------------------------------+---------- 2010 | 7 0 0 | 7 2011 | 4 9 0 | 13 2012 | 4 23 29 | 56 2013 | 0 12 4 | 16 2014 | 3 5 5 | 13 2015 | 0 1 3 | 4 -----------+---------------------------------+---------- Total | 18 50 41 | 109

I have a lot of missing values in my dataset which now decreased the sample size substantially. Could this be an issue?

Actually I was also just thinking about whether my approach might not be staggered at all, even though the reform I am studying was implemented in a staggered way.

Since I am looking at labor market outcomes I dropped observations from individuals younger than 25 years from my final sample. Data from before the treatment I only used to get information on where people went to school and to extract some socioeconomic controls.

Consequently, all my observations are post-treatment (after individuals got treated with the school reform or not).

Could it be that a simple TWFE approach might now be sufficient? Is this my mistake?

Best,
Frida
Comment
FernandoRios

Join Date: Apr 2014

Posts: 2491
#6

03 Jun 2025, 12:12

you can see here, there is NO never treated group (gvar is never zero
you also have zeroes before treatment happen, and that is incorrect.
even with repeated crossection data, the underlying assumption to use DID in general (and CSDID in specific) is that you know exactly when a group would have been treated. So ou cannot have zeroes
Comment

Announcement

csdid: Defining never treated individuals

Comment

Comment

Comment

Comment

Comment