Staggered intervention on the individual level

Eleni Kyrkopoulou

Join Date: Nov 2020
Posts: 2

Staggered intervention on the individual level

19 Sep 2024, 12:08

Hi everyone,

May I please ask for your help on the following subject?

I have a sample of students from different schools that entered the 2005 lottery to move to a better school. Some students were randomly selected and moved to the new school in either 2006 or 2007. I want to estimate the effect of moving to a better school on their grades. I have the grades of all students (selected and non-selected, before and after the move).

I cannot implement hdidregress, as the treatment did not apply to certain groups. Reading around the forum I became more confused as to if this is a staggered diff-in-diff or a stepped wedge design. If there is anyone familiar with the subject I would really appreciate the help.

In the following example, id is the id of the student, school is the name of the school, treated indicated if the student has moved to the new school grade is the grade of the student and year is the year.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float id str19 school float treated double grade float year
 1 "51st CA"        0 16.153846740722656 2004
 1 "51st CA"        0              15.75 2005
 1 "51st CA"        0                 15 2006
 2 "51st CA"        0  17.08333396911621 2004
 2 "51st CA"        0 17.090909957885742 2005
 2 "51st CA"        0 15.833333015441895 2006
 3 "23rd Rd Island" 0 13.461538314819336 2005
 3 "23rd Rd Island" 0                 16 2006
 3 "23rd Rd Island" 0                 12 2007
 4 "23rd Rd Island" 0  16.83333396911621 2004
 4 "23rd Rd Island" 0 16.090909957885742 2005
 4 "23rd Rd Island" 1 15.833333015441895 2006
 5 "51st CA"        0  18.66666603088379 2005
 5 "51st CA"        1  17.41666603088379 2006
 5 "51st CA"        1 15.583333015441895 2007
 5 "51st CA"        1 15.115385055541992 2008
 5 "51st CA"        1 15.630768775939941 2009
 5 "51st CA"        1 17.127273559570313 2010
 6 "Diap 090"       0  14.15384578704834 2004
 6 "Diap 090"       0 12.916666984558105 2005
 6 "Diap 090"       0 13.083333015441895 2006
 7 "Diap 090"       0              18.75 2005
 7 "Diap 090"       0               14.7 2006
 7 "Diap 090"       0                 15 2007
 8 "Diap 090"       0 18.538461685180664 2004
 8 "Diap 090"       0                 16 2005
 8 "Diap 090"       0                 14 2006
 9 "Diap 090"       0 10.692307472229004 2004
 9 "Diap 090"       0                 16 2005
 9 "Diap 090"       0                 13 2006
10 "Diap 090"       0 13.076923370361328 2005
10 "Diap 090"       0 11.583333015441895 2006
10 "Diap 090"       0                 13 2007
end

Thank you for reading my post!

Tags: None

Erik Ruzek

Join Date: Oct 2017

Posts: 420
#2

20 Sep 2024, 12:08

Eleni,

A couple of questions...
For those who win the lottery and are assigned to the new school, shouldn't their school id change?

What is the goal of the analysis?

I am not as versed in the various econometric approaches to this kind of analysis as others. That said, I do research in education, and you have something rare, which is random assignment to the school treatment. So one approach would be to use random or mixed effects models to analyze the panel. This can be fairly straightforward if, for example, all you are interested in is the control-treatment contrast:

Code:

* two major sources of cluster variance - multiple students from a school and multiple observations of the same student mixed grade i.treated || sid: || id:, reml // appropriately accounts for nested structure

If the answer to my first question is yes, then treated students will be switching schools and you need to appropriately account for this crossing of student and school. With this kind of data, control students could also switch schools:

Code:

mixed grade i.treated || _all:R.sid: || id:, reml // this is the most efficient code if you have fewer schools than students

Now you have the basic treatment-control contrast, but I can imagine being interested in other questions. For example, does the proportion of time a student spent in a treatment school provide a boost?

Code:

bysort id: egen imn_trt = mean(treated) mixed grade i.treated c.imn_trt || _all:R.sid: || id:, reml * Instead of c.imn_trt, you could use i.imn_trt if you have only three or four proportions in the data

Does the effect of being in treatment or control vary across students?

Code:

mixed grade i.treated c.imn_trt || _all:R.sid: || id: i.treated, cov(un) reml

You can go in lots of directions from here, including but not limited to, interactions between i.treatment and school characteristics. As you build models that are nested within one another, you can use model testing to determine whether one model provides a better fit than a simpler model. Use estimates store to store model results and run likelihood ratio tests for the model comparison (lrtest m0 m1, stats). Note that if you are testing models that differ only in the non-varying (fixed effects) predictors, then you will need to estimate the model using full maximum likelihood (the default in mixed). I used reml here because you showed us data from just three schools. I imagine you have many more schools in your data and you can remove the reml option if so.

Last edited by Erik Ruzek; 20 Sep 2024, 12:10. Reason: Clarification about model testing and reml
Comment
Erik Ruzek

Join Date: Oct 2017

Posts: 420
#3

21 Sep 2024, 12:25

One thing I did not address is how to handle time. In the mixed modeling framework, it can be handled quite flexibly. It depends on what kinds of comparisons you want to make. If you want to make within-year comparisons, then you can add it to the non-varying (fixed) part of the model as a predictor (i.year). If you want to look at between student differences in rates of change, then you can model it as a continuous variable and allow each student to have a unique growth trajectory:

Code:

* Recode year so 0 is meaningful gen year0 = year - 2004 // first year of your panel mixed grade i.treated c.imn_trt c.year0 || _all:R.sid: || id:c.year0, cov(un) reml * you can model time with a quadratic term if appropriate (c.year0##c.year0)

And as, I stated in post #2, you can interact the varying slope variables with other student and school characteristics. Perhaps most critical, depending on your RQ, is to interact time (either i.year or c.year0) with treatment. This gives you the treatment effect in each year (i.treatment##i.year) or differential growth in grades by treatment status (i.treatment##c.year0). These models can be endlessly augmented. They key thing is to be very clear about your research questions and to map those onto the appropriate model.

Last edited by Erik Ruzek; 21 Sep 2024, 12:35. Reason: Added information about time by treatment interaction
Comment

Announcement

Staggered intervention on the individual level

Comment

Comment