Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find first occurance of values in several variables

    Hi,

    I am working with a dataset containing visits to a clinic. Each date the patient is given a set of diagnoses (DIA*). In the full dataset up to 30 diagnoses per visit_date are registered. A patient (id) may have 1 visit in total, up to several hundreds of visits, each visit with a number of diagnoses.

    I want to find the first occurance of each diagnosis for each patient. I.e. the date the patient received the diagnosis for the first time.


    This is a working excerpt of my dataset.
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id int visit_date str6 DIA1 str5 DIA2 str7 DIA3 str5(DIA4 DIA5)
    6 18437 "I639" ""     ""      ""      ""    
    6 18440 "I639" "I109" "N390X" ""      ""    
    6 20193 "M485" "I109" "M809"  "E780"  ""    
    6 20203 "J159" "R651" "I109"  "N179"  ""    
    9 21147 "A099" "I109" "G629"  "L899F" "E869"
    9 21154 "L979" "G629" "N390"  "B964"  "L309"
    9 21177 "A415" "R572" "G629"  "L899F" "N179"
    end
    format %td visit_date



    This is the desired result
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id int visit_date str6 new_DIA1 str7(new_DIA2 new_DIA3) str5(new_DIA4 new_DIA5)
    6 18437 "I639" ""      ""     ""      ""    
    6 18440 "I109" "N390X" ""     ""      ""    
    6 20193 "M485" "M809"  "E780" ""      ""    
    6 20203 "J159" "R651"  "N179" ""      ""    
    9 21147 "A099" "I109"  "G629" "L899F" "E869"
    9 21154 "L979" "N390"  "B964" "L309"  ""    
    9 21177 "A415" "R572"  "N179" ""      ""    
    end
    format %td visit_date

    If anyone has an idea for a solution I would be very grateful.

    Best regards,

    Jesper Eriksson


  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long id int visit_date str6 DIA1 str5 DIA2 str7 DIA3 str5(DIA4 DIA5)
    6 18437 "I639" ""     ""      ""      ""    
    6 18440 "I639" "I109" "N390X" ""      ""    
    6 20193 "M485" "I109" "M809"  "E780"  ""    
    6 20203 "J159" "R651" "I109"  "N179"  ""    
    9 21147 "A099" "I109" "G629"  "L899F" "E869"
    9 21154 "L979" "G629" "N390"  "B964"  "L309"
    9 21177 "A415" "R572" "G629"  "L899F" "N179"
    end
    format %td visit_date
    
    reshape long DIA, i(id visit_date) j(seq)
    
    drop if missing(DIA)
    
    bysort id DIA (visit_date) : gen first = DIA if _n == 1   
    
    reshape wide DIA first, i(id visit_date) j(seq)
    
    egen FIRST = concat(first*), p(" ")
    
    split FIRST 
    
    drop first* FIRST 
    
    list, sepby(id)
    
         +--------------------------------------------------------------------------------------------------+
         | id   visit_d~e   DIA1   DIA2    DIA3    DIA4   DIA5   FIRST1   FIRST2   FIRST3   FIRST4   FIRST5 |
         |--------------------------------------------------------------------------------------------------|
      1. |  6   24jun2010   I639                                   I639                                     |
      2. |  6   27jun2010   I639   I109   N390X                    I109    N390X                            |
      3. |  6   15apr2015   M485   I109    M809    E780            M485     M809     E780                   |
      4. |  6   25apr2015   J159   R651    I109    N179            J159     R651     N179                   |
         |--------------------------------------------------------------------------------------------------|
      5. |  9   24nov2017   A099   I109    G629   L899F   E869     A099     I109     G629    L899F     E869 |
      6. |  9   01dec2017   L979   G629    N390    B964   L309     L979     N390     B964     L309          |
      7. |  9   24dec2017   A415   R572    G629   L899F   N179     A415     R572     N179                   |
         +--------------------------------------------------------------------------------------------------+

    Comment


    • #3
      Originally posted by Nick Cox View Post
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input long id int visit_date str6 DIA1 str5 DIA2 str7 DIA3 str5(DIA4 DIA5)
      6 18437 "I639" "" "" "" ""
      6 18440 "I639" "I109" "N390X" "" ""
      6 20193 "M485" "I109" "M809" "E780" ""
      6 20203 "J159" "R651" "I109" "N179" ""
      9 21147 "A099" "I109" "G629" "L899F" "E869"
      9 21154 "L979" "G629" "N390" "B964" "L309"
      9 21177 "A415" "R572" "G629" "L899F" "N179"
      end
      format %td visit_date
      
      reshape long DIA, i(id visit_date) j(seq)
      
      drop if missing(DIA)
      
      bysort id DIA (visit_date) : gen first = DIA if _n == 1
      
      reshape wide DIA first, i(id visit_date) j(seq)
      
      egen FIRST = concat(first*), p(" ")
      
      split FIRST
      
      drop first* FIRST
      
      list, sepby(id)
      
      +--------------------------------------------------------------------------------------------------+
      | id visit_d~e DIA1 DIA2 DIA3 DIA4 DIA5 FIRST1 FIRST2 FIRST3 FIRST4 FIRST5 |
      |--------------------------------------------------------------------------------------------------|
      1. | 6 24jun2010 I639 I639 |
      2. | 6 27jun2010 I639 I109 N390X I109 N390X |
      3. | 6 15apr2015 M485 I109 M809 E780 M485 M809 E780 |
      4. | 6 25apr2015 J159 R651 I109 N179 J159 R651 N179 |
      |--------------------------------------------------------------------------------------------------|
      5. | 9 24nov2017 A099 I109 G629 L899F E869 A099 I109 G629 L899F E869 |
      6. | 9 01dec2017 L979 G629 N390 B964 L309 L979 N390 B964 L309 |
      7. | 9 24dec2017 A415 R572 G629 L899F N179 A415 R572 N179 |
      +--------------------------------------------------------------------------------------------------+
      Wonderful! Thanks Nick!

      Comment

      Working...
      X