Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Combining two panel databases

    Dear Stata team,

    I was wondering if anyone could help me with the following:
    I have a panel database with patients ids, test dates, and the results of exams from a given machine x.
    Another database has the ids (not always the same in both databases), test dates (not always the same in both), and results from machine y.
    Some patients underwent exams x and y on different days.
    I would like to combine the two databases by id and test date but only matching the ids of patients that had both tests done within +/- 5 days.

    Any thoughts?
    Here is what the data look like:

    Database 1:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id double testdate float x
    1 21206 10
    1 21236 12
    1 21246 11
    2 21201 13
    2 21215 14
    2 21228 12
    3 21208 11
    3 21210 10
    3 21218  9
    end

    Database 2:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id double testdate float y
    1 21206 200
    1 21236 150
    1 21246 180
    2 21201 120
    2 21215 205
    2 21228 100
    3 21208  95
    3 21210 110
    3 21218 130
    end

    Thank you!

  • #2
    Well, this code will do what you asked, although I question whether it is really what you want.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id double testdate float x
    1 21206 10
    1 21236 12
    1 21246 11
    2 21201 13
    2 21215 14
    2 21228 12
    3 21208 11
    3 21210 10
    3 21218  9
    end
    
    tempfile dataset1
    save `dataset1'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id double testdate float y
    1 21206 200
    1 21236 150
    1 21246 180
    2 21201 120
    2 21215 205
    2 21228 100
    3 21208  95
    3 21210 110
    3 21218 130
    end
    
    tempfile dataset2
    save `dataset2'
    
    use `dataset1'
    rangejoin testdate -5 5 using `dataset2', by(id)
    renam testdate testdate_x
    rename testdate_U testdate_y
    
    format testdate* %td
    
    list, noobs clean
    Note: -rangejoin- is by Robert Picard and is available from SSC. To use -rangejoin- you also need to have -rangestat-, by Robert Picard, Nick Cox, and Roberto Ferrer, also available from SSC.

    The problem arises from patients like id 3 where tests are done on both machines at short intervals. Since #3 has tests on both 24jan2018 and 26jan2018 on both machines, the end result is four paired observations. My guess is you only want two of those. But for that you would have to specify some rule that could be coded to retain only the desired pairings. (For example, you might want to retain, for each x-date, only the observation with the closest y-date, and if there is a tie for closest y-date, pick the pairing in which the x-date precedes the y-date, or something like that.)

    Comment


    • #3
      Thank you, Clyde.
      This is exactly what I wanted.
      (the issue with the example you described may not happen in my actual database - the one I shared was just a random example so you had an idea of the data structure).
      Much appreciated.

      Comment

      Working...
      X