Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • trying to merge variable from two databases please help!

    Hello,

    I have a prospectively collected database that is following de-indentified patients after a surgical procedure. Unfortunately the data is collected in a manner which separates the base demographic/admission data and the followup data into two different excel spreadsheets. Even worse the follow up data has multiple entries for each unique patient ID for every clinic visit after surgery. I am trying to take the last date of followup from the followup data and insert it into a new followup variable in my base dataset. Any help or suggestions?

    Thank you!
    Waseem Lutfi

    . dataex ncdrpatientid f_assessmentdate, c(25)

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long ncdrpatientid int f_assessmentdate
      47442 20990
      47442 21333
    1076017 20234
    1619281 19265
    1743905 19611
    1743905 19940
    2086724 20026
    2086724 20360
    2308651 19263
    2330403 19100
    2334626 20101
    2403625 19260
    2403625 19481
    2405597 19338
    2411052 19261
    2411052 19607
    2415986 19233
    2415986 19572
    2428193 21824
    2428193 22153
    2447381 19369
    2448183 19086
    2448183 19397
    2448207 19397
    2460642 19326
    end
    format %tdnn/dd/CCYY f_assessmentdate
    ------------------ copy up to and including the previous line ------------------

    Listed 25 out of 3209 observations

    . frame change base

    . dataex ncdrpatientid, c(25)

    ----------------------- copy starting from the next line -----------------------
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long ncdrpatientid
      47442
    1076017
    1619281
    1873602
    1875342
    2086724
    2244374
    2308651
    2330403
    2334626
    2341444
    2403625
    2405597
    2411052
    2415986
    2428193
    2447381
    2448183
    2448207
    2460642
    2465198
    2467096
    2472346
    2481028
    2481193
    end
    ------------------ copy up to and including the previous line ------------------

    Listed 25 out of 1882 observations

  • #2
    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long ncdrpatientid int f_assessmentdate
      47442 20990
      47442 21333
    1076017 20234
    1619281 19265
    1743905 19611
    1743905 19940
    2086724 20026
    2086724 20360
    2308651 19263
    2330403 19100
    2334626 20101
    2403625 19260
    2403625 19481
    2405597 19338
    2411052 19261
    2411052 19607
    2415986 19233
    2415986 19572
    2428193 21824
    2428193 22153
    2447381 19369
    2448183 19086
    2448183 19397
    2448207 19397
    2460642 19326
    end
    format %tdnn/dd/CCYY f_assessmentdate
    tempfile follow_up
    save `follow_up'
    
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input long ncdrpatientid
      47442
    1076017
    1619281
    1873602
    1875342
    2086724
    2244374
    2308651
    2330403
    2334626
    2341444
    2403625
    2405597
    2411052
    2415986
    2428193
    2447381
    2448183
    2448207
    2460642
    2465198
    2467096
    2472346
    2481028
    2481193
    end
    tempfile baseline
    save `baseline'
    
    use `follow_up', clear
    by ncdrpatientid (f_assessmentdate), sort: egen last_assessment_date ///
        = max(f_assessmentdate)
    format last_assessment_date %td
    
    merge m:1 ncdrpatientid using `baseline'
    It isn't clear what you want to accomplish by merging the baseline and follow-up data sets. At least in the example data shown, the baseline data set has no information not already found in the follow-up data set. But perhaps that is not true of the full data sets--perhaps there are other variables you chose not to show here.

    Comment


    • #3
      Thanks for the reply! Correct there are over 100 variables in the base dataset.

      as you can see in the base data set each ID appears only once at the start of each row while in the follow up dataset a new row was created for each date that the patient was seen starting with the ID. What I am trying to do is take the last date of follow up only (since that’s the longest follow up time) and link it to the ID variable in the base dataset under a new variable. I’ll give your code here a shot and see if it works out.
      Last edited by Waseem Lutfi; 12 Jul 2023, 19:28.

      Comment


      • #4
        What I am trying to do is take the last date of follow up only (since that’s the longest follow up time) and link it to the ID variable in the base dataset under a new variable.
        I'm not sure what you mean by this. The only thing you can link between two data sets is a variable that they both have in common. So only the ID variable can be a link in your situation.

        If "take the last date of followup only" means that you want to discard the observations in the follow-up data set that are other than the final one, then the code would be:
        Code:
        use `follow_up', clear
        by ncdrpatientid (f_assessmentdate), sort: keep if _n == _N
        rename f_assessmentdate final_assessment_date
        merge 1:1 ncdrpatientid using `baseline'

        Comment


        • #5
          It worked! Thank you so much. Wow this looks great. I appreciate your time. I'll add this syntax to my armamentarium.

          Best,
          Waseem

          Comment

          Working...
          X