Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Reformatting dataset

    Hello Statalists,

    I'm new to Statalist, I've read the forum guidance and tried to follow but please let me know if I did something wrong by mistake.

    I have been trying to reformat/restructure my dataset, which looks like this

    ​​​​​​
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int PATIENT_ID str31 OBSERVATION_NAME str18 VALUE float RESULTS_DATE
    872 "ALT" "23.0" 21790
    872 "ALT" "22.0" 20375
    872 "ALT" "17.0" 21804
    872 "BMI" "29.4" 21143
    872 "BMI" "30.0" 21561
    964 "ALT" "16.7" 22320
    964 "BMI" "28.2" 20324
    end
    format %td RESULTS_DATE

    to create a new dataset that contains a PATIENT_ID variable, separate variables for each value of OBSERVATION_NAME, where their observations carry the values of VALUE variable, and RESULTS_DATE could be expanded to multiple variables according to OBSERVATION_NAME. It should look like this


    PATIENT_ID ALT BMI RESULTS_DATE_ALT RESULTS_DATE_BMI
    872 23.0 29.4 21790 21143
    872 22.0 30.0 20375 21561
    872 17.0 . 21804 .
    964 16.7 28.2 . 20324


    I explored reshape, xpose, collapse options but could not get to where I wanted. Any advice is much appreciated. Thank you!
    Last edited by Khanh Ha; 22 Oct 2021, 05:30.

  • #2
    Looks like a standard reshape wide with a sorting condition.

    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input int PATIENT_ID str31 OBSERVATION_NAME str18 VALUE float RESULTS_DATE
    872 "ALT" "23.0" 21790
    872 "ALT" "22.0" 20375
    872 "ALT" "17.0" 21804
    872 "BMI" "29.4" 21143
    872 "BMI" "30.0" 21561
    964 "ALT" "16.7" 22320
    964 "BMI" "28.2" 20324
    end
    format %td RESULTS_DATE
    
    bys PATIENT_ID OBSERVATION_NAME (RESULTS_DATE): gen which=_n
    reshape wide VALUE RESULTS_DATE, i(PATIENT_ID which) j(OBSERVATION_NAME) string
    Res.:

    Code:
    . l
    
         +----------------------------------------------------------------+
         | PATIEN~D   which   VALUEALT   RESULTS~T   VALUEBMI   RESULTS~I |
         |----------------------------------------------------------------|
      1. |      872       1       22.0   14oct2015       29.4   20nov2017 |
      2. |      872       2       23.0   29aug2019       30.0   12jan2019 |
      3. |      872       3       17.0   12sep2019                      . |
      4. |      964       1       16.7   09feb2021       28.2   24aug2015 |
         +----------------------------------------------------------------+

    Comment


    • #3
      It worked! I didn't think to put both PATIENT_ID and which into i dimension. Thanks so much Andrew!

      Comment

      Working...
      X