Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple values from same samples

    Hi all.

    I have been struggling with the issue for a few days and seriously need your help.

    I have a dataset that includes multiple blood test result from the same participants.

    It looks like this;

    Result ID Patient ID Test result
    1 11111 10
    2 11111 20
    3 11111 30
    4 11112 10
    5 11112 20

    I need to pick up the results that have the highest test result value in each patient (flagged with red colour in the table above)

    I tried the commands below but did not work (the commands drop the data that meant to be preserved).

    bysort id: egen max_data=max(data)
    drop if data < max_data


    The commands above actually dropped some of the samples where data==max_data.

    Do you have any solutions for this situation?

    Thank you for your support.

    Ayaka


  • #2
    Ayaka:
    welcome to this forum.
    Do you mean something along the following lines?
    Code:
    . bysort Patient_id ( Result ): egen wanted=max( Test_result)
    
    . list
    
         +---------------------------------------+
         | Result   Patien~d   Test_r~t   wanted |
         |---------------------------------------|
      1. |      1      11111         10       30 |
      2. |      2      11111         20       30 |
      3. |      3      11111         30       30 |
      4. |      4      11112         10       20 |
      5. |      5      11112         20       20 |
         +---------------------------------------+
    
    . bysort Patient_id: drop if Test_result!= wanted
    
    
    . list
    
         +---------------------------------------+
         | Result   Patien~d   Test_r~t   wanted |
         |---------------------------------------|
      1. |      3      11111         30       30 |
      2. |      5      11112         20       20 |
         +---------------------------------------+
    
    .
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Hi Carlo.
      Thank you very much for checking my post!

      Yes I tried those commands but they are not working as I expected.
      I previously checked how many Patient IDs are included in the data set, which is approx. 600.
      However, after those commands only 150 results are left.
      I am wondering if I am using commands incorrectly or if it's the issue of dataset itself.

      Kind regards,

      Ayaka

      Comment


      • #4
        For numerical variables, missing values are evaluated larger than nonmissing values. Therefore, sorting from maximum to minimum in the presence of missing values may result in what you observe. But Carlo Lazzaro 's suggestion in #2 should handle this. Another way is below:

        Code:
        gsort PatientID -Result
        by PatientID: keep if _n==1
        Note the negative sign before "Result" in the gsort command. See

        Code:
        help gsort
        Find a sample of your data that replicates your problem and present it using dataex.

        Code:
        sort ParientID
        dataex
        Last edited by Andrew Musau; 01 Nov 2023, 05:57.

        Comment

        Working...
        X