Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Numbering sequential visits, even with multiple data rows per visit

    Hello, the following simulation uses the command
    Code:
    bysort var1(var2): generate var3=_n
    to create, within subject ID, sequential numbers for each successive clinic visit.

    However, note that the sequence number created [chemono] 1 through 6, rather than the 1 through 3 which is the desired outcome.
    In other words, I wish the data to reflect that each of these two subjects had 3 visits (parenthetically, my code already correctly calculates that within each visit they each received two medications).

    How would I modify the -bysort- command to achieve the correct visit sequence numbers?

    Code:
    clear
        input str2 id str9 datetx str11 drug  
        1 13mar2014   Carboplatin  
        1 13mar2014    Paclitaxel  
        1 03apr2014   Carboplatin  
        1 03apr2014    Paclitaxel  
        1 24apr2014   Carboplatin  
        1 24apr2014    Paclitaxel  
        2 15may2014   Cytoxan
        2 15may2014    Adriamycin
        2 05jun2014   Cytoxan
        2 05jun2014    Adriamycin
        2 26jun2014   Cytoxan
        2 26jun2014    Adriamycin
    end
    l, noo sepby(id)
    by datetx drug, sort: gen polychem = _n==1
    by datetx: replace polychem=sum(polychem)
    by datetx: replace polychem=polychem[_N]
    bysort id (datetx): generate chemono = _n
    l id chemono drug polychem datetx, noo sepby(id datetx)

  • #2
    First, you'll never get anywhere when your dates are represented as strings. They won't sort into the correct order. So the first step is to convert them to Stata internal format numeric dates. Then it's just a standard trick with -by-:

    Code:
    set more off
    clear
        input str2 id str9 datetx str11 drug  
        1 13mar2014   Carboplatin  
        1 13mar2014    Paclitaxel  
        1 03apr2014   Carboplatin  
        1 03apr2014    Paclitaxel  
        1 24apr2014   Carboplatin  
        1 24apr2014    Paclitaxel  
        2 15may2014   Cytoxan
        2 15may2014    Adriamycin
        2 05jun2014   Cytoxan
        2 05jun2014    Adriamycin
        2 26jun2014   Cytoxan
        2 26jun2014    Adriamycin
    end
    gen sif_date = date(datetx, "DMY")
    format sif_date %td
    
    by id sif_date, sort: gen seq = (_n == 1)
    by id: replace seq = sum(seq)

    Comment


    • #3
      Thank you Clyde for pointing out the SIF date issue! Am I correct in understanding that the command line
      Code:
      by id datetx, sort: gen chemono = (_n == 1)
      sequentially sums the variable -seq- within the row groups of the now properly ordered dates?

      Comment


      • #4
        Well, your code creating the variable chemono in #1 and in #3 differs. So I'm not sure what you intend. If you can show an example that includes what you want the values of chemono to be in this same example data, I can check which code is correct, or find something else that will do it.

        Comment

        Working...
        X