Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Script produces a value even when observation doesn't meet the requirements

    Hi!

    I have an issue which I previously have received great help on in this and this thread. Unfortunately, a new issue recently came across. In short: I have a follow-up dataset of patients after surgery. I want to investigate wether or not they develop malignancy after having had surgery. Patients are followed six months post-op, thereafter annually. Since there are no date of diagnosis, I have used this code below to randomize a date of diagnosis between last follow-up where there was no diagnosis and the follow-up where malignancy diagnosis was noted.

    I had an issue with negative values, which I thought was because malignancy was noted on the first follow up post surgery, and received advice on how to change this (see below). However, now, I have identified that some cases where that is the case and malignancy is noted during the first follow up post surgery (e.g. there is no cum_malignancy == 0), it still produces a value which thereafter leads to the variable time_to_malignancy becoming negative, since the date variable wanted states it developed a malignancy pre surgery (in this patient selection, that is not possible).

    The code I am using:
    Code:
    bys id_code (surg_date followup_date): gen cum_malignancy = sum(malignancy)  
    gen num_days = (followup_date-followup_date[_n-1])-1 if cum_malignancy == 1 & cum_malignancy[_n-1] == 0  
    set seed 123
    gen wanted = followup_date[_n-1] + runiformint(1,num_days) if cum_malignancy == 1 & cum_malignancy[_n-1] == 0  
    
    replace num_days = followup_date - surg_date if num_days<0
    replace wanted = surg_date + runiformint(1,num_days) if !missing(num_day) & missing(wanted)
    
     format %td wanted
    
    drop if !(cum_malignancy == 1 & cum_malignancy[_n-1] == 0)
    
    gen time=datediff(surg_date, wanted, "day" >  )
    Dataex:
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input str16 id_code float malignancy long surg_date float(followup_date cum_malignancy num_days wanted)
    "B1010" 1 15910 16310  1 -5440     .
    "B1010" 1 15910 16642  2     .     .
    "B1010" 1 15910 17027  3     .     .
    "B1010" 1 15910 17372  4     .     .
    "B1010" 1 15910 17741  5     .     .
    "B1010" 1 15910 18105  6     .     .
    "B1010" 1 15910 18469  7     .     .
    "B1010" 1 15910 18861  8     .     .
    "B2020" 0 15918 16342  0     .     .
    "B2020" 0 15918 16799  0     .     .
    "B2020" 0 15918 17074  0     .     .
    "B2020" 0 15918 17499  0     .     .
    "B2020" 1 15918 17822  1   322 17610  .
    "B3030" 1 21067 21174  1  3780 20739
    "B4040" 1 17347 17714 1 1634 16758
    "B4040" 1 17347 18079 2    .     .
    "B4040" 1 17347 18449 3    .     .
    "B5050" 0 15103 15526  0   .     .
    "B5050" 0 15103 15865  0   .     .
    "B5050" 0 15103 16199  0   .     .
    "B5050" 0 15103 16569  0   .     .
    "B5050" 0 15103 16931  0   .     .
    "B5050" 0 15103 17304  0   .     .
    "B5050" 0 15103 17660  0   .     .
    "B5050" 1 15103 18025  1 364 17778
    end
    format %td surg_date
    format %td followup_date
    format %td wanted
    In this example, the only one behaving correctly is B5050. B1010 I can manage with my code replacing all num_days<0, but B4040 just does not make sense.
    Any ideas will be greatly appreciated!
Working...
X