Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a randomized date between two followups or surgery

    Hi!

    I have an issue with a piece of code which I received help on in an earlier post, but I have now identified a further issue. I have a follow-up dataset of patients after surgery. I want to investigate wether or not they develop malignancy after having had surgery. Patients are followed six months post-op, thereafter annually.
    The data looks like this:

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16 id_code float malignancy long surg_date float followup_date
    "B1020" . 17123 17754
    "B1020" . 17123 17392
    "B1020" . 17123 18149
    "B1020" . 17123 18423
    "B1020" . 17123 18819
    "B1020" 1 17123 19244
    "B1020" . 17123 19523
    "B1020" . 17123 19988
    "B1020" . 17123 20114
    "B1020" . 16112 16592
    "B1020" . 16112 17842
    "B1020" . 16112 18201
    "B1020" . 16112 18759
    "B1020" . 16112 19147
    "B1020" . 16112 19348
    "B1020" . 16112 19371
    "B1020" . 16112 20146
    "B1020" . 16112 20529
    "B1020" . 16112 20873
    "B1020" . 16112 21314
    "B1020" . 16112 21624
    "B1020" . 16112 21838
    "B1020" . 16112 17582
    "B1020" . 16112 17123
    "B1020" . 16112 16172
    "B2012" . 17254 17223
    "B2012" . 17254 18335
    "B2012" . 17254 18719
    "B2012" . 17254 19142
    "B2012" . 17254 19416
    "B2012" . 17254 19783
    "B2012" . 17254 20117
    "B2012" . 17254 20501
    "B2012" . 17254 20832
    "B2012" . 17254 17801
    "B2012" . 17254 21321
    "B2012" . 17254 21549
    "B2012" . 17254 21999
    "B2013" 1 14634 16767
    "B2013" . 14634 16817
    "B2013" . 14634 16585
    "B2013" . 14634 17165
    "B2010" . 15124 17271
    "B2010" . 15124 17994
    "B2010" . 15124 17593
    "B2010" . 15124 18318
    "B2010" . 15124 18706
    "B2010" . 15124 19062
    "B2010" . 15124 19416
    "B2010" 1 15124 19789
    "B2010" . 15124 20151
    "B2010" . 15124 20523
    "B2010" . 15124 20893
    "B2010" . 15124 21271
    "B2010" . 15124 21627
    "B2010" . 15124 21998
    "B2010" . 15124 16909
    "B2010" . 15124 16583
    "B2054" . 17348 17654
    "B2054" . 17348 17843
    "B2054" . 17348 18102
    "B2054" . 17348 18548
    "B2074" 1 17562 17849
    "B2074" . 17562 18332
    "B2074" . 17562 18474
    "B2074" . 17562 18850
    "B2074" . 17562 19237
    "B2074" . 17562 19788
    "B2074" . 17562 19863
    "B2074" . 17562 20257
    "B2074" . 17562 20742
    "B2074" . 17562 20973
    "B2074" . 17562 20249
    "B2074" . 17562 22694
    end
    format %td surg_date
    format %td followup_date
    Since there are no date of diagnosis, I have used this code to randomize a date of diagnosis between last follow-up where there was no diagnosis and the follow-up where malignancy diagnosis was noted:

    Code:
    bys id_code (surg_date followup_date): gen cum_malignancy = sum(malignancy)
    gen num_days = (followup_date-followup_date[_n-1])-1 if cum_malignancy == 1 & cum_malignancy[_n-1] == 0
    set seed 123 gen wanted = followup_date[_n-1] + runiformint(1,num_days) if cum_malignancy == 1 & cum_malignancy[_n-1] == 0
    format %td wanted
    drop if !(cum_malignancy == 1 & cum_malignancy[_n-1] == 0)
    The issue is that some observations are left with negative num_days and therefor no date, probably because malignancy is noted on the first follow-up after surgery:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16 id_code float malignancy long surg_date float(followup_date cum_malignancy num_days wanted)
    "B1020" 1 17123 19244 1  424 18980
    "B2010" 1 15124 19789 1  372 19701
    "B2013" 1 14634 16767 1  181 16635
    "B2074" 1 17562 17849 1 -700     .
    end
    format %td surg_date
    format %td followup_date
    format %td wanted
    Is there any way to write the script in such way that if malignancy is noted first follow-up post surgery, it randomizes a date in between surgery and first-follow up? Extremely grateful for help!
    Last edited by Vilma Antonov; 19 May 2023, 04:17.

  • #2
    Try add this middle line in between the two lines like this:

    Code:
    gen num_days = (followup_date-followup_date[_n-1])-1 if cum_malignancy == 1 & cum_malignancy[_n-1] == 0
    
    replace num_days = followup_date - surg_date if num_days < 0
    
    set seed 123

    Comment


    • #3
      Thank you!

      Unfortunately it resulted in a random date that is not within the time period...
      Click image for larger version

Name:	Skärmavbild 2023-05-19 kl. 16.47.50.png
Views:	1
Size:	64.7 KB
ID:	1714216


      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str16 id_code float malignancy long surg_date float(followup_date cum_malignancy num_days wanted)
      "B1020" 1 17123 19244 1 424 18980
      "B2010" 1 15124 19789 1 372 19701
      "B2013" 1 14634 16767 1 181 16635
      "B2074" 1 17562 17849 1 287 18648
      end
      format %td surg_date
      format %td followup_date
      format %td wanted
      Last edited by Vilma Antonov; 19 May 2023, 08:48.

      Comment


      • #4
        Sorry, I didn't check the formula that generates "wanted" carefully. It's possible to process that at the end:

        Code:
        bys id_code (surg_date followup_date): gen cum_malignancy = sum(malignancy)
        
        gen num_days = (followup_date-followup_date[_n-1])-1 if cum_malignancy == 1 & cum_malignancy[_n-1] == 0
        
        set seed 123
        gen wanted = followup_date[_n-1] + runiformint(1,num_days) if cum_malignancy == 1 & cum_malignancy[_n-1] == 0
        
        replace num_days = followup_date - surg_date if num_days < 0
        replace wanted = surg_date + runiformint(1,num_days) if !missing(num_day) & missing(wanted)
        
        format %td wanted
        drop if !(cum_malignancy == 1 & cum_malignancy[_n-1] == 0)

        Comment


        • #5
          Thank you so much, works perfectly now!

          Comment

          Working...
          X