Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a date between two followups

    Hi,

    I have a large set of observations in a followup file, looking like this:
    ID_CODE MALIGNANCY DATE OF SURGERY DATE OF FOLLOW-UP
    B1020 NO 2019-09-22 2019-12-13
    B1020 NO 2019-09-22 2020-08-17
    B1020 YES 2019-09-22 2021-10-12
    B1020 YES 2019-09-22 2022-06-30
    Since there is no date of when the malignancy was diagnosed, I would like to create a new variable with the date in between the two follow-ups where the malignancy first was noted, and remove the other irrelevant observations.
    I have been using this script provided by the forum to choose the 2021-10-12 observation:
    Code:
    keep if inlist(malignancy, "Y")
    bysort id_code: gen closest = abs(surg_date-followup_date)
    gsort closest
    sort trr_id_code
    by id_code(closest), sort: keep if _n == 1
    And would essentially like something similar but with the date in-between this followup and the previous one. Unfortunately, I'm not skilled enough myself and would appreciate any help I could get!
    Last edited by Vilma Antonov; 22 Nov 2022, 03:21.

  • #2
    Could you please provide the above data as an extract using the dataex command?

    Comment


    • #3
      Of course, sorry for not doing it earlier! My data is confidential but this is a made-up replica.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str16 id_code float malignancy long surg_date float followup_date
      "B1020" . 17123 17754
      "B1020" . 17123 17392
      "B1020" . 17123 18149
      "B1020" . 17123 18423
      "B1020" . 17123 18819
      "B1020" 1 17123 19244
      "B1020" . 17123 19523
      "B1020" . 17123 19988
      "B1020" . 17123 20114
      "B1020" . 16112 16592
      "B1020" . 16112 17842
      "B1020" . 16112 18201
      "B1020" . 16112 18759
      "B1020" . 16112 19147
      "B1020" . 16112 19348
      "B1020" . 16112 19371
      "B1020" . 16112 20146
      "B1020" . 16112 20529
      "B1020" . 16112 20873
      "B1020" . 16112 21314
      "B1020" . 16112 21624
      "B1020" . 16112 21838
      "B1020" . 16112 17582
      "B1020" . 16112 17123
      "B1020" . 16112 16172
      "B2012"  . 17254 17223
      "B2012"  . 17254 18335
      "B2012"  . 17254 18719
      "B2012"  . 17254 19142
      "B2012"  . 17254 19416
      "B2012"  . 17254 19783
      "B2012"  . 17254 20117
      "B2012"  . 17254 20501
      "B2012"  . 17254 20832
      "B2012"  . 17254 17801
      "B2012"  . 17254 21321
      "B2012"  . 17254 21549
      "B2012"  . 17254 21999
      "B2013" . 14634 16767
      "B2013" 1 14634 16817
      "B2013" . 14634 16585
      "B2013" . 14634 17165
      "B2010" . 15124 17271
      "B2010" . 15124 17994
      "B2010" . 15124 17593
      "B2010" . 15124 18318
      "B2010" . 15124 18706
      "B2010" . 15124 19062
      "B2010" . 15124 19416
      "B2010" 1 15124 19789
      "B2010" . 15124 20151
      "B2010" . 15124 20523
      "B2010" . 15124 20893
      "B2010" . 15124 21271
      "B2010" . 15124 21627
      "B2010" . 15124 21998
      "B2010" . 15124 16909
      "B2010" . 15124 16583
      "B2054" . 17348 17654
      "B2054" . 17348 17843
      "B2054" . 17348 18102
      "B2054" . 17348 18548
      "B2074" . 17562 17849
      "B2074" . 17562 18332
      "B2074" . 17562 18474
      "B2074" . 17562 18850
      "B2074" . 17562 19237
      "B2074" . 17562 19788
      "B2074" . 17562 19863
      "B2074" . 17562 20257
      "B2074" . 17562 20742
      "B2074" . 17562 20973
      "B2074" . 17562 20249
      "B2074" . 17562 22694
      end
      format %td surg_date
      format %td followup_date

      Comment


      • #4
        I'm not quite sure what date you want to impute in-between the last follow-up and the current one. The code below picks the mid-point, but you could modify it as you want.

        Code:
        bys id_code (surg_date followup_date): gen cum_malignancy = sum(malignancy)
        gen wanted = followup_date[_n-1] + floor(0.5*(followup_date-followup_date[_n-1])) if cum_malignancy == 1 & cum_malignancy[_n-1] == 0
        format %td wanted
        drop if !(cum_malignancy == 1 & cum_malignancy[_n-1] == 0)
        drop cum_malignancy
        which produces:
        Code:
        . li, noobs ab(20)
          +--------------------------------------------------------------+
          | id_code   malignancy   surg_date   followup_date      wanted |
          |--------------------------------------------------------------|
          |   B1020            1   18nov2006       08sep2012   08feb2012 |
          |   B2010            1   29may2001       07mar2014   01sep2013 |
          |   B2013            1   25jan2000       16jan2006   22dec2005 |
          +--------------------------------------------------------------+
        Last edited by Hemanshu Kumar; 22 Nov 2022, 06:26.

        Comment


        • #5
          Hemanshu Kumar Yes!! This works! Thank you so much. Hope you have a great day!!

          Comment


          • #6
            Hi again (perhaps @Hemanshu Kumar), is it possible to make it pick a random date between the two followup-dates?

            Comment


            • #7
              runiformint() is presumably what you seek here. I am guessing that you mean a draw from a uniform (flat, rectangular) distribution.

              Code:
              . di %td runiformint(mdy(1,1,2022), mdy(11,25,2022))
              28jun2022
              Note: Hemanshu Kumar is extremely active and helpful -- yet I can't speak on his behalf, naturally. But in general replying to a question is not volunteering to answer your questions in future. It's best not to ping an individual unless there is a very specific reason to do that, e.g. that they wrote a particular command.

              Comment


              • #8
                Sure, something like this:
                Code:
                bys id_code (surg_date followup_date): gen cum_malignancy = sum(malignancy)
                gen num_days = (followup_date-followup_date[_n-1])-1 if cum_malignancy == 1 & cum_malignancy[_n-1] == 0
                set seed 123
                gen wanted = followup_date[_n-1] + runiformint(1,num_days) if cum_malignancy == 1 & cum_malignancy[_n-1] == 0
                format %td wanted
                drop if !(cum_malignancy == 1 & cum_malignancy[_n-1] == 0)
                drop cum_malignancy num_days
                where the 1's that have been coloured red are to ensure that the random date is strictly between the two follow-up dates. If you want to allow it to be equal to one or both, you should change one or both of those to zeros, as appropriate.

                The result:
                Code:
                . li, noobs ab(20)
                  +--------------------------------------------------------------+
                  | id_code   malignancy   surg_date   followup_date      wanted |
                  |--------------------------------------------------------------|
                  |   B1020            1   18nov2006       08sep2012   19dec2011 |
                  |   B2010            1   29may2001       07mar2014   09dec2013 |
                  |   B2013            1   25jan2000       16jan2006   14jan2006 |
                  +--------------------------------------------------------------+

                Comment


                • #9
                  Thank you Nick! Yes of course, I stupidly did not think about that, was just super happy with the code I got help with earlier, but very obvious now you pointed it out.

                  And thank you especially Hemanshu, this works great!

                  Comment

                  Working...
                  X