Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • stcox diff-in-diff?

    I've got data that looks like the following:
    ID Year Enter Event Died
    1 2015 1 0 0
    2 2012 1 0 0
    2 2018 0 1 0
    3 2014 1 0 0
    3 2020 0 0 1
    Enter indicates the time when they enter a risk-pool, Event indicates they experienced outcome, and Died indicated they are censored. So we can see that person 1 entered the risk pool in 2015 and we have no further data, person 2 entered the riskpool in 2012 and experienced the event in 2018, and person 3 entered the risk pool in 2014 but died in 2022.

    In addition, the data runs until 2022, so person 1 would be considered as censored starting in 2022.

    I additionally have a gender variable, and I'm interested in examining whether there are gender differences in the hazards. The tricky bit is there's a policy change in 2017 that may have changed things, so I'm interested in examining some sort of survival analogue of an diff-in-diff, that is looking at whether the gender hazard rate differences differ in the pre-2017 and post-2017 periods. (E.g. in a non-survival model, I'd do something like `mixed y i.prepost##i.gender##c.year || id:`.)

    I'm assuming the way to do this is to expand the data-set to full year-person level, generate a `prepost` variable, and include it as `prepost##gender` in my `stcox`.

    If that's an appropriate way to proceed, can anyone off any suggestions of how to manipulate the data to get there? I think it would need to look like:
    ID Year Enter Event Died prepost
    1 2015 1
    1 2016
    1 2017
    1 2018 1
    1 2019 1
    1 2021 1
    1 2022 1 1
    2 2012 1
    2 2013
    2 2014
    2 2015
    2 2016
    2 2017
    2 2018 1 1
    3 2014 1
    3 2015
    3 2016
    3 2017
    3 2018 1
    3 2019 1
    3 2020 1 1
    (I'm suppressing the 0's in the last four columns to simplify.)

    If that's not an appropriate way to proceed, can anyone suggest a better modeling approach?
    Last edited by Josh Errickson; 01 Jun 2023, 13:05.

  • #2
    The general approach makes sense. The specific implementation you show will fail. Your modified data set will need to look like your original data set in that the dichotomous variables need to be coded 1/0. The 1/missing version you show in the second table will result in the omission of every observation from the estimation sample, and -stcox- will simply halt with a message telling you there are no observations to process.

    Comment


    • #3
      Yes, I understand that, as I mentioned, I suppressed the 0's in that table to make the data easier to read. I'm asking for advice in transforming my data into that layout.

      Comment


      • #4
        Yes, I understand that, as I mentioned, I suppressed the 0's in that table to make the data easier to read.
        Sorry, but we do see a lot of posts here where regression analyses fail because of 1/missing variable coding. And while you said you did it to simplify, I couldn't infer from that that you know that that wouldn't really work in Stata.

        You can transform your data as follows:
        Code:
        * Example generated by -dataex-. For more info, type help dataex
        clear
        input byte id int year byte(enter event died)
        1 2015 1 0 0
        2 2012 1 0 0
        2 2018 0 1 0
        3 2014 1 0 0
        3 2020 0 0 1
        end
        
        tsset id year
        tsfill , full
        
        by id (year), sort: egen entry_year = min(cond(enter == 1, year, .))
        drop if year < entry_year
        by id (year), sort: egen exit_year = min(cond(inlist(1, died, event), year, .))
        drop if year > exit_year
        gen byte pre_post = year > 2017
        drop enter entry_year exit_year
        
        mvencode event died, mv(0) override
        
        stset year, failure(event == 1) id(id)
        Now, you won't be able to get a Cox regression on pre_post out of the example data because, for example, all of the events and deaths occur after 2017. But presumably that is not the case in the real data set. Similarly, the example data has no gender variable, but the real data set presumably does.

        In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.

        Comment

        Working...
        X