Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Data management: extracting relevant event codes and their associated dates

    Hi

    My dataset is of hospital admissions, it is confidential but I will provide an example below. Each observation is an admission, and variables include things like patient id, patient demographics, dates, and details of operations and the operation dates. There are 339 000 observations and 150 variables

    The operation variables are named op01 op02 op03 op04 ... op30. Each value is a code (ACHI - Australian procedure codes) for the operation that happened. The date of the operation are seperate variables opdate01, opdate02, update03 etc.

    The dataset includes every operation that happened for every admission for all the patients in my study. I only care about a small selection of operations. I want to filter out the operations and dates I dont care about, and create new variables for the ones I do, to enable me to examine rates over time, and perform survival analysis. I imagine the variable names would be similar rev_op01 v_op02 v_date02 v_date02, and cell contents would still contain the ACHI codes.

    I can't figure out how to do this.

    I am using State 17 on a mac.

    Example data set. For this purpose let's say I care about the operation codes 3451800 and 3531000 (though there are actually about 20 relevant codes). Dates are in Stata DMY %td format

    id op01 op02 op03 op04 opdate01 opdate02 opdate03 opdate04
    1 3047301 9520300 23Aug17 24Aug17
    1 3451800 03Sept18
    1 9520202 20Dec20
    2 5650100 3040600 1310000 3531800 15Jul06 16Jul06 18Jul06 20Jul06
    2 5600100 01Feb08
    2 5640700 03Jul10
    3 3531000 3451800 06Jun07 08Jun07
    3 9555001 5650100 3040600 03May12 10May12 13May12
    3 3451800 27Nov14


    Thankyou

    Katherine






  • #2
    Code:
    clear
    
    input id long(op01 op02 op03 op04) str7(opdate01 opdate02 opdate03 opdate04) 
    1 3047301 9520300 .  . 23Aug17 24Aug17 "" ""
    1 3451800 . . .  03Sept18 "" "" ""
    1 9520202 . . .  20Dec20  "" "" ""
    2 5650100 3040600 1310000 3531800 15Jul06 16Jul06 18Jul06 20Jul06
    2 5600100 . . .  01Feb08 "" "" ""
    2 5640700 . . . 03Jul10 "" "" ""
    3 3531000 3451800 . .  06Jun07 08Jun07 "" "" 
    3 9555001 5650100 3040600 . 03May12 10May12 13May12 "" ""
    3 3451800 . . . 27Nov14 "" "" ""
    end 
    
    gen long newid = _n 
    reshape long op opdate, i(newid) j(which) string 
    keep if inlist(op, 3451800,3531000) 
    
    list 
    
         +----------------------------------------+
         | newid   which   id        op    opdate |
         |----------------------------------------|
      1. |     2      01    1   3451800   03Sept1 |
      2. |     7      01    3   3531000   06Jun07 |
      3. |     7      02    3   3451800   08Jun07 |
      4. |     9      01    3   3451800   27Nov14 |
         +----------------------------------------+

    Comment


    • #3
      Thanks Nick, I hadn't thought of reshaping, this works.



      Comment

      Working...
      X