Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Determining the earliest date from a list of observation based on certain criteria

    Dear Stata Master,

    I have a dataset with patient ID, date of treatment start, date of sputum collection, and sputum result. I need to calculate time in days to sputum conversion = d_coll - d_starttx. Sputum conversion is the first negative result of two consecutive negative results (without positive result in between).

    Here is the example of my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input str13 regnum int(d_starttx d_coll) float res_cult_combine
    "002" 21643 21623 1
    "002" 21643 21630 1
    "002" 21643 21672 0
    "002" 21643 21704 0
    "002" 21643 21733 0
    "002" 21643 21767 2
    "002" 21643 21795 0
    "002" 21643 21823 0
    "002" 21643 21858 0
    "002" 21643 21886 0
    "002" 21643 21923 0
    "002" 21643 22117 0
    "002" 21643 22305 0
    "004" 21644 21570 1
    "004" 21644 21643 1
    "004" 21644 21672 1
    "004" 21644 21704 0
    "004" 21644 21733 0
    "004" 21644 21763 0
    "004" 21644 21795 0
    "004" 21644 21824 0
    "004" 21644 21857 0
    "004" 21644 21894 0
    "004" 21644 21922 0
    "032" 21687 21507 1
    "032" 21687 21507 1
    "032" 21687 21623 1
    "032" 21687 21682 1
    "032" 21687 21721 0
    "032" 21687 21749 1
    "032" 21687 21780 0
    "032" 21687 21819 0
    "032" 21687 21829 0
    "032" 21687 21868 0
    "032" 21687 21901 0
    "032" 21687 21956 0
    "032" 21687 21964 0
    "032" 21687 21992 0
    "032" 21687 22218 0
    "046" 21697 21663 1
    "046" 21697 21689 1
    "046" 21697 21725 0
    "046" 21697 21755 1
    "046" 21697 21790 0
    "046" 21697 21818 0
    "046" 21697 21846 0
    end
    format %td d_starttx
    format %td d_coll
    label values res_cult_combine res_cult
    label def res_cult 0 "Negative for M. tuberculosis", modify
    label def res_cult 1 "Positive for M. tuberculosis", modify
    label def res_cult 2 "Contaminated", modify
    label var d_starttx "Treatment start date" 
    label var d_coll "Sample Collection Date"
    I'm trying to use the following code:

    *culture to be considered for conversion: negative result, done at least 14 days after treatment starting date
    gen t_cult_rx=d_coll-d_starttx
    gen neg_cult=1 if d_coll>d_starttx+14 & res_cult_combine==0
    replace neg_cult=0 if neg_cult==.
    label define neg_cult 0 "not conversion" 1 "conversion"
    label val neg_cult neg_cult

    *rank all the culture done. 1 is the oldest
    gen nk_neg_cult=0 if neg_cult==1
    replace nk_neg_cult=1 if neg_cult!=1
    gsort regnum nk_neg_cult d_coll res_cult_combine
    bys regnum: gen neg_cult_rank=_n if nk_neg_cult==0
    label var neg_cult_rank "older neg cult = 1"

    *identify date 1st negative
    bys regnum: gen f_neg_d_coll=d_coll if neg_cult_rank==1
    bys regnum: egen first_neg_d_coll=max(f_neg_d_coll)
    format first_neg_d_coll %d
    drop f_neg_d_coll

    *identify date 2nd negative
    bys regnum: gen s_neg_d_coll=d_coll if neg_cult_rank==2
    bys regnum: egen scnd_neg_d_coll=max(s_neg_d_coll)
    format scnd_neg_d_coll %d
    drop s_neg_d_coll

    I'm having difficulty for patients with a positive result in between negative result (regnum 032 and 046).

    Could anybody help me with this?

    Thank you in advance.

  • #2
    Try this instead of your code?
    Code:
    bys regnum d_starttx (d_coll): gen candidate = (datediff(d_starttx,d_coll,"day")>14 & res_cult == 0 & res_cult[_n+1] == 0)
    bys regnum d_starttx: egen _first_conversion = min(d_coll) if candidate
    bys regnum d_starttx: egen first_conversion = max(_first_conversion)
    format %td first_conversion
    
    gen days_to_conversion = datediff(d_starttx,first_conversion,"day")
    drop _first_conversion candidate
    which produces the following results (looking at just one observation for every regnum and d_starttx:

    Code:
    egen tag = tag(regnum d_starttx)
    li regnum d_starttx first_conversion days_to_conversion if tag, noobs
    
      +-------------------------------------------+
      | regnum   d_starttx   first_c~n   days_t~n |
      |-------------------------------------------|
      |    002   04apr2019   03may2019         29 |
      |    004   05apr2019   04jun2019         60 |
      |    032   18may2019   19aug2019         93 |
      |    046   28may2019   29aug2019         93 |
      +-------------------------------------------+
    Last edited by Hemanshu Kumar; 14 Sep 2022, 10:04.

    Comment


    • #3
      Hi Hemanshu Kumar , thank you for your swift response, Looking at the syntax, it seems to be correct. However, when I tried it in my stata, it said "unknown function datediff()". Do I need to be on certain version of Stata to be able to use this? I have Stata/IC 15.1 for Windows (64-bit x86-64). Revision 21 Mar 2019.

      Comment


      • #4
        Ah yes, -datediff()- was not in Stata 15. Luckily, it's not really needed here; you can find the duration in days with simple subtraction:

        Code:
        bys regnum d_starttx (d_coll): gen candidate = (d_coll - d_starttx > 14 & res_cult == 0 & res_cult[_n+1] == 0)
        bys regnum d_starttx: egen _first_conversion = min(d_coll) if candidate
        bys regnum d_starttx: egen first_conversion = max(_first_conversion)
        format %td first_conversion
        
        gen days_to_conversion = first_conversion - d_starttx
        drop _first_conversion candidate

        Comment

        Working...
        X