I've got data that looks like the following:
Enter indicates the time when they enter a risk-pool, Event indicates they experienced outcome, and Died indicated they are censored. So we can see that person 1 entered the risk pool in 2015 and we have no further data, person 2 entered the riskpool in 2012 and experienced the event in 2018, and person 3 entered the risk pool in 2014 but died in 2022.
In addition, the data runs until 2022, so person 1 would be considered as censored starting in 2022.
I additionally have a gender variable, and I'm interested in examining whether there are gender differences in the hazards. The tricky bit is there's a policy change in 2017 that may have changed things, so I'm interested in examining some sort of survival analogue of an diff-in-diff, that is looking at whether the gender hazard rate differences differ in the pre-2017 and post-2017 periods. (E.g. in a non-survival model, I'd do something like `mixed y i.prepost##i.gender##c.year || id:`.)
I'm assuming the way to do this is to expand the data-set to full year-person level, generate a `prepost` variable, and include it as `prepost##gender` in my `stcox`.
If that's an appropriate way to proceed, can anyone off any suggestions of how to manipulate the data to get there? I think it would need to look like:
(I'm suppressing the 0's in the last four columns to simplify.)
If that's not an appropriate way to proceed, can anyone suggest a better modeling approach?
ID | Year | Enter | Event | Died |
1 | 2015 | 1 | 0 | 0 |
2 | 2012 | 1 | 0 | 0 |
2 | 2018 | 0 | 1 | 0 |
3 | 2014 | 1 | 0 | 0 |
3 | 2020 | 0 | 0 | 1 |
In addition, the data runs until 2022, so person 1 would be considered as censored starting in 2022.
I additionally have a gender variable, and I'm interested in examining whether there are gender differences in the hazards. The tricky bit is there's a policy change in 2017 that may have changed things, so I'm interested in examining some sort of survival analogue of an diff-in-diff, that is looking at whether the gender hazard rate differences differ in the pre-2017 and post-2017 periods. (E.g. in a non-survival model, I'd do something like `mixed y i.prepost##i.gender##c.year || id:`.)
I'm assuming the way to do this is to expand the data-set to full year-person level, generate a `prepost` variable, and include it as `prepost##gender` in my `stcox`.
If that's an appropriate way to proceed, can anyone off any suggestions of how to manipulate the data to get there? I think it would need to look like:
ID | Year | Enter | Event | Died | prepost |
1 | 2015 | 1 | |||
1 | 2016 | ||||
1 | 2017 | ||||
1 | 2018 | 1 | |||
1 | 2019 | 1 | |||
1 | 2021 | 1 | |||
1 | 2022 | 1 | 1 | ||
2 | 2012 | 1 | |||
2 | 2013 | ||||
2 | 2014 | ||||
2 | 2015 | ||||
2 | 2016 | ||||
2 | 2017 | ||||
2 | 2018 | 1 | 1 | ||
3 | 2014 | 1 | |||
3 | 2015 | ||||
3 | 2016 | ||||
3 | 2017 | ||||
3 | 2018 | 1 | |||
3 | 2019 | 1 | |||
3 | 2020 | 1 | 1 |
If that's not an appropriate way to proceed, can anyone suggest a better modeling approach?
Comment