Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to refer to a specific date to create a cutpoint that will change based on year

    I have a dataset that includes one yearly observation per individual (1994-2023). Individuals can exit the dataset (death or other reasons) and a cut off date (July 1st) is used to decide what year that happens. If they exit then they have an 'exitdate'. I want to identify when individuals will exit the dataset and tag them for removal in the appropiate year.

    In the example below i show data for 2 individuals, with a record for each year and an exitdate. ID 8 will exit the dataset in 2019 as their exitdate is before July 1st of 2019. ID 13 will exit in 2010 as their exit date is after July 1st of 2009.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long akdnid int year float exitdate
     8 2008 21617
     8 2009 21617
     8 2010 21617
     8 2011 21617
     8 2012 21617
     8 2013 21617
     8 2014 21617
     8 2015 21617
     8 2016 21617
     8 2017 21617
     8 2018 21617
     8 2019 21617
    13 2008 18100
    13 2009 18100
    13 2010 18100
    end
    format %td exitdate

    I have used the following code to identify when each individual should exit.
    Code:
      
     gen year_cut = (mdy(07,01,2008)) if year ==2008
        replace year_cut = (mdy(07,01,2009)) if year ==2009
        replace year_cut = (mdy(07,01,2010)) if year ==2010
        replace year_cut = (mdy(07,01,2011)) if year ==2011
        replace year_cut = (mdy(07,01,2012)) if year ==2012
        replace year_cut = (mdy(07,01,2013)) if year ==2013
        replace year_cut = (mdy(07,01,2014)) if year ==2014
        replace year_cut = (mdy(07,01,2015)) if year ==2015
        replace year_cut = (mdy(07,01,2016)) if year ==2016
        replace year_cut = (mdy(07,01,2017)) if year ==2017
        replace year_cut = (mdy(07,01,2018)) if year ==2018
        replace year_cut = (mdy(07,01,2019)) if year ==2019
        replace year_cut = (mdy(07,01,2020)) if year ==2020
        replace year_cut = (mdy(07,01,2021)) if year ==2021
        replace year_cut = (mdy(07,01,2022)) if year ==2022
        format year_cut %td
    
        gen exit = 1 if exitdate<year_cut
        replace exit = 0 if exitdate>year_cut & exitdate !=.
        drop if exit ==1
    My question is how can i do this more efficiently, perhaps in a loop?

    Thank you



  • #2
    It's just a one-liner:
    Code:
    gen exit_year = cond(exitdate < mdy(7, 1, year(exitdate)), year(exitdate), year(exitdate) + 1)
    drop if year >= exit_year
    Note: You say what happens if the exit date is before or after July 1, but you don't say what happens if it is exactly July 1. The code above assumes that July 1 itself is treated the same way as after July 1. If that's not what you want, change < to <=.

    Added: Actually, even that is unnecessarily complicated:

    Code:
    gen exit_year = year(exitdate) + (exitdate >= mdy(7, 1, year(exitdate)))
    drop if year >= exit_year
    does the same thing.
    Last edited by Clyde Schechter; 22 Jul 2024, 11:22.

    Comment


    • #3
      That is what I wanted. Thank you very much, Clyde

      Comment

      Working...
      X