Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Advice reshaping and appending two differently formatted data sets.

    Hi Statalist,
    I am having difficulty getting my dataset organized optimally. I have looked through the FAQs but haven’t quite found what I am looking for. Apologies if the info is there and thank you in advance for any advice. I am eager to learn more. Perhaps I am not asking the right questions.

    Problem: I have 2 sets of data for the same longitudinal observation study. Data Set 1 is in ‘wide’ format (study id 1-221, data manually entered into excel). Data Set 2 is in ‘long’ format (study id 222-418, exported from a RedCap longitudinal database to excel).
    I have attempted to reshape Data Set 1 to long format unsuccessfully, before I append it with Data Set 2.

    Context: There are 5 time points. T0, t1, t2, t3, t4.
    - Some variables were measured only at t0 (Injury-related variables e.g. loss of consciousness, vomiting).
    - Some variables were measured at only t2 and t4 (a 10-item scale of psychological distress).
    - I am hoping to examine the relationship between injury related variables at t0 and the Total scores on the 10-item scale at t2 and at t4.

    Desired outcome: A single data set for study id 1-418, where I can calculate the Total Scores on the 10-item scale at T2 and at T4 for each id that has scores at these time points, and examine relationships between these total scores and the t0 variables.

    In data set 2 I have also encoded the variable 'redcap_event_name' from a string to numeric and then tried to reshape using that but I just received errors.
    Please see the example data sets below. crf is an example of a measure taken at t0. the variables beginning with k10 are the items from the 10-item scale taken at t2 and t4.


    Data Set 1 Example:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id byte(crf2_3a k10_01_t2 k10_02_t2 k10_01_t4 k10_02_t4)
      1 0  3  2  3  3
      2 1  1  3  .  .
      3 1  .  .  .  .
      4 0  3  1  .  .
      5 1  2  2  2  2
      6 0  2  1  1  2
      7 0  1  1  .  .
      8 1  1  2  1  1
      9 0  3  3  3  3
     10 0  2  2  .  .
     11 0  3  2  2  1
     12 0  2  2  1  2
     13 0 .a .a  2  1
     14 0  1  1  .  .
     15 0  .  .  .  .
     16 1  1  3  4  3
     17 0  2  2  2  3
     18 0  2  3  1  2
     19 .  .  .  .  .
     20 1  2  3  2  1
     21 0  3  3  3  3
     22 0  .  .  .  .
     23 0  2  1  2  1
     24 .  .  .  .  .
     25 .  .  .  .  .
     26 0  1  2  1  2
     27 0  .  .  .  .
     28 .  .  .  .  .
     29 0  2  2  .  .
     30 0  .  .  .  .
     31 0  3  2  1  2
     32 2  .  .  .  .
     33 0  5  4  4  4
     34 0  2  1  2  2
     35 1  .  .  .  .
     36 0  .  .  .  .
     37 0  .  .  .  .
     38 2  2  1  .  .
     39 0  1  1  1  2
     40 0  1  1  2  1
     41 0  .  .  .  .
     42 .  3  1  2  1
     43 0  3  3  3  2
     44 0  3  2  1  1
     45 2  1  2  1  1
     46 0  .  .  .  .
     47 0  1  1  1  1
     48 0  3  2  2  1
     49 1  .  .  .  .
     50 2  2  3  1  1
     51 1  1  3  2  1
     52 0  1  3  1  2
     53 0  .  .  .  .
     54 0  3  2  2  1
     55 2  2  1  .  .
     56 0  3  2  2  2
     57 9  4  2  1  2
     58 1  4  3  2  3
     59 1  1  1  1  2
     60 1  2  2  1  1
     61 0  2  1  1  1
     62 0  2  1  1  1
     63 0  .  .  .  .
     64 2  3  2  3  3
     65 0  3  3  5  2
     66 0  2  1  3  2
     67 1  3  3  3  2
     68 0  2  1  2  1
     69 0  2  1  1  1
     70 2  1  2  1  2
     71 0  2  2  2  2
     72 1  .  .  .  .
     73 0  1  1  1  1
     74 0  3  2  4  1
     75 0  3  1  1  1
     76 0  2  2  2  2
     77 0  1  2  1  1
     78 9  3  3  .  .
     79 0  1  1  2  2
     80 0  .  .  .  .
     81 2  1  1  1  3
     82 0  .  .  .  .
     83 0  .  .  .  .
     84 1  3  4 .a .a
     85 1  .  .  .  .
     86 0  .  .  .  .
     87 0  1  1  2  2
     88 1  .  .  .  .
     89 1  .  .  .  .
     90 0  2  2  1  2
     91 2  .  .  .  .
     92 1  2  2  3  1
     93 1  .  .  .  .
     94 1  3  2  .  .
     95 0  1  1  .  .
     96 0  4  3  4  4
     97 9  .  .  .  .
     98 1  2  2  2  1
     99 0  2  2  2  2
    100 9  1  1  1  1
    end
    label values crf2_3a crf2_3albl
    label def crf2_3albl 0 "no", modify
    label def crf2_3albl 1 "yes", modify
    label def crf2_3albl 2 "suspected", modify
    label def crf2_3albl 9 "unknown", modify
    Data Set 2 Example:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int record_id_t0 str8 redcap_event_name int id byte(crf2_3a k10_01_t2 k10_02_t2 k10_01_t4 k10_02_t4)
    222 "t0_arm_1" 222 2 . . . .
    222 "t1_arm_1"   . . . . . .
    222 "t2_arm_1"   . . . . . .
    222 "t3_arm_1"   . . . . . .
    222 "t4_arm_1"   . . . . . .
    223 "t0_arm_1" 223 1 . . . .
    223 "t1_arm_1"   . . . . . .
    223 "t2_arm_1"   . . . . . .
    223 "t3_arm_1"   . . . . . .
    223 "t4_arm_1"   . . . . . .
    224 "t0_arm_1" 224 3 . . . .
    224 "t1_arm_1"   . . . . . .
    224 "t2_arm_1"   . . 3 1 . .
    224 "t3_arm_1"   . . . . . .
    224 "t4_arm_1"   . . . . 3 2
    225 "t0_arm_1" 225 0 . . . .
    225 "t1_arm_1"   . . . . . .
    225 "t2_arm_1"   . . 2 2 . .
    225 "t3_arm_1"   . . . . . .
    225 "t4_arm_1"   . . . . 1 1
    226 "t0_arm_1" 226 0 . . . .
    226 "t1_arm_1"   . . . . . .
    226 "t2_arm_1"   . . . . . .
    226 "t3_arm_1"   . . . . . .
    226 "t4_arm_1"   . . . . . .
    227 "t0_arm_1" 227 0 . . . .
    227 "t1_arm_1"   . . . . . .
    227 "t2_arm_1"   . . 5 2 . .
    227 "t3_arm_1"   . . . . . .
    227 "t4_arm_1"   . . . . 4 2
    228 "t0_arm_1" 228 0 . . . .
    228 "t1_arm_1"   . . . . . .
    228 "t2_arm_1"   . . . . . .
    228 "t3_arm_1"   . . . . . .
    228 "t4_arm_1"   . . . . . .
    229 "t0_arm_1" 229 0 . . . .
    229 "t1_arm_1"   . . . . . .
    229 "t2_arm_1"   . . 1 1 . .
    229 "t3_arm_1"   . . . . . .
    229 "t4_arm_1"   . . . . 1 1
    230 "t0_arm_1" 230 . . . . .
    230 "t1_arm_1"   . . . . . .
    230 "t2_arm_1"   . . 1 1 . .
    230 "t3_arm_1"   . . . . . .
    230 "t4_arm_1"   . . . . 2 1
    231 "t0_arm_1" 231 . . . . .
    231 "t1_arm_1"   . . . . . .
    231 "t2_arm_1"   . . . . . .
    231 "t3_arm_1"   . . . . . .
    231 "t4_arm_1"   . . . . . .
    232 "t0_arm_1" 232 1 . . . .
    232 "t1_arm_1"   . . . . . .
    232 "t2_arm_1"   . . 2 1 . .
    232 "t3_arm_1"   . . . . . .
    232 "t4_arm_1"   . . . . 2 2
    233 "t0_arm_1" 233 2 . . . .
    233 "t1_arm_1"   . . . . . .
    233 "t2_arm_1"   . . . . . .
    233 "t3_arm_1"   . . . . . .
    233 "t4_arm_1"   . . . . . .
    234 "t0_arm_1" 234 0 . . . .
    234 "t1_arm_1"   . . . . . .
    234 "t2_arm_1"   . . 1 1 . .
    234 "t3_arm_1"   . . . . . .
    234 "t4_arm_1"   . . . . 1 1
    235 "t0_arm_1" 235 0 . . . .
    235 "t1_arm_1"   . . . . . .
    235 "t2_arm_1"   . . 1 1 . .
    235 "t3_arm_1"   . . . . . .
    235 "t4_arm_1"   . . . . 1 1
    236 "t0_arm_1" 236 . . . . .
    236 "t1_arm_1"   . . . . . .
    236 "t2_arm_1"   . . . . . .
    236 "t3_arm_1"   . . . . . .
    236 "t4_arm_1"   . . . . 1 1
    237 "t0_arm_1" 237 2 . . . .
    237 "t1_arm_1"   . . . . . .
    237 "t2_arm_1"   . . . . . .
    237 "t3_arm_1"   . . . . . .
    237 "t4_arm_1"   . . . . . .
    238 "t0_arm_1" 238 2 . . . .
    238 "t1_arm_1"   . . . . . .
    238 "t2_arm_1"   . . 3 3 . .
    238 "t3_arm_1"   . . . . . .
    238 "t4_arm_1"   . . . . 1 1
    239 "t0_arm_1" 239 0 . . . .
    239 "t1_arm_1"   . . . . . .
    239 "t2_arm_1"   . . 1 2 . .
    239 "t3_arm_1"   . . . . . .
    239 "t4_arm_1"   . . . . 1 1
    240 "t0_arm_1" 240 1 . . . .
    240 "t1_arm_1"   . . . . . .
    240 "t2_arm_1"   . . . . . .
    240 "t3_arm_1"   . . . . . .
    240 "t4_arm_1"   . . . . . .
    241 "t0_arm_1" 241 1 . . . .
    241 "t1_arm_1"   . . . . . .
    241 "t2_arm_1"   . . . . . .
    241 "t3_arm_1"   . . . . . .
    241 "t4_arm_1"   . . . . . .
    end
    Kind Regards,

    James

  • #2
    Well, the reason you haven't found anything exactly like this is because this has a few distinctive wrinkles in it.

    The first data set needs a fairly straightforward -reshape long-. The hardest part is just getting the right list of variable stubs for the -reshape- command. You could write them out manually, or, as in the code below, you can generate a local macro containing them in a loop.

    The second data set is complicated. It is not in long layout It is a hybrid of long and wide layout, one that combines the worst of both worlds! It appears to be long because it does contain separate observations for each id for each time period. But notice that it also contains separate *_t2 and *_t4 variables (the hallmark of wide layout), and that these variables have missing values in all observations except the ones corresponding to the same time period. Notice also that the variable id is missing in most observations, and only the variable record_id_t0 actually encodes the subject ID in every observation. Similarly, crf2_3a is only recorded in one observation for each subject. Attempting to -reshape- this into long layout will just produce a useless jumble.

    As for the variable recap_event_name, it does not convey any information that is not recoverable from the other variables, and it has no counterpart in the first data set. So I choose to just ignore it, and the -collapse- command eliminates it altogether.

    So the first step is to make this data set truly wide, parallel to that of the original dataset1, with a single observation for each id, picking out the one non-missing value for id crf2_3a and each of the k10* variables among the various observations it provides. The -collapse- command does precisely this. Then the resulting data set can be -reshape-d long and -append-ed to the reshaped first data set.

    Putting this all together:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int id byte(crf2_3a k10_01_t2 k10_02_t2 k10_01_t4 k10_02_t4)
      1 0  3  2  3  3
      2 1  1  3  .  .
      3 1  .  .  .  .
      4 0  3  1  .  .
      5 1  2  2  2  2
      6 0  2  1  1  2
      7 0  1  1  .  .
      8 1  1  2  1  1
      9 0  3  3  3  3
     10 0  2  2  .  .
     11 0  3  2  2  1
     12 0  2  2  1  2
     13 0 .a .a  2  1
     14 0  1  1  .  .
     15 0  .  .  .  .
     16 1  1  3  4  3
     17 0  2  2  2  3
     18 0  2  3  1  2
     19 .  .  .  .  .
     20 1  2  3  2  1
     21 0  3  3  3  3
     22 0  .  .  .  .
     23 0  2  1  2  1
     24 .  .  .  .  .
     25 .  .  .  .  .
     26 0  1  2  1  2
     27 0  .  .  .  .
     28 .  .  .  .  .
     29 0  2  2  .  .
     30 0  .  .  .  .
     31 0  3  2  1  2
     32 2  .  .  .  .
     33 0  5  4  4  4
     34 0  2  1  2  2
     35 1  .  .  .  .
     36 0  .  .  .  .
     37 0  .  .  .  .
     38 2  2  1  .  .
     39 0  1  1  1  2
     40 0  1  1  2  1
     41 0  .  .  .  .
     42 .  3  1  2  1
     43 0  3  3  3  2
     44 0  3  2  1  1
     45 2  1  2  1  1
     46 0  .  .  .  .
     47 0  1  1  1  1
     48 0  3  2  2  1
     49 1  .  .  .  .
     50 2  2  3  1  1
     51 1  1  3  2  1
     52 0  1  3  1  2
     53 0  .  .  .  .
     54 0  3  2  2  1
     55 2  2  1  .  .
     56 0  3  2  2  2
     57 9  4  2  1  2
     58 1  4  3  2  3
     59 1  1  1  1  2
     60 1  2  2  1  1
     61 0  2  1  1  1
     62 0  2  1  1  1
     63 0  .  .  .  .
     64 2  3  2  3  3
     65 0  3  3  5  2
     66 0  2  1  3  2
     67 1  3  3  3  2
     68 0  2  1  2  1
     69 0  2  1  1  1
     70 2  1  2  1  2
     71 0  2  2  2  2
     72 1  .  .  .  .
     73 0  1  1  1  1
     74 0  3  2  4  1
     75 0  3  1  1  1
     76 0  2  2  2  2
     77 0  1  2  1  1
     78 9  3  3  .  .
     79 0  1  1  2  2
     80 0  .  .  .  .
     81 2  1  1  1  3
     82 0  .  .  .  .
     83 0  .  .  .  .
     84 1  3  4 .a .a
     85 1  .  .  .  .
     86 0  .  .  .  .
     87 0  1  1  2  2
     88 1  .  .  .  .
     89 1  .  .  .  .
     90 0  2  2  1  2
     91 2  .  .  .  .
     92 1  2  2  3  1
     93 1  .  .  .  .
     94 1  3  2  .  .
     95 0  1  1  .  .
     96 0  4  3  4  4
     97 9  .  .  .  .
     98 1  2  2  2  1
     99 0  2  2  2  2
    100 9  1  1  1  1
    end
    label values crf2_3a crf2_3albl
    label def crf2_3albl 0 "no", modify
    label def crf2_3albl 1 "yes", modify
    label def crf2_3albl 2 "suspected", modify
    label def crf2_3albl 9 "unknown", modify
    
    tempfile dataset1
    save `dataset1'
    
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input int record_id_t0 str8 redcap_event_name int id byte(crf2_3a k10_01_t2 k10_02_t2 k10_01_t4 k10_02_t4)
    222 "t0_arm_1" 222 2 . . . .
    222 "t1_arm_1"   . . . . . .
    222 "t2_arm_1"   . . . . . .
    222 "t3_arm_1"   . . . . . .
    222 "t4_arm_1"   . . . . . .
    223 "t0_arm_1" 223 1 . . . .
    223 "t1_arm_1"   . . . . . .
    223 "t2_arm_1"   . . . . . .
    223 "t3_arm_1"   . . . . . .
    223 "t4_arm_1"   . . . . . .
    224 "t0_arm_1" 224 3 . . . .
    224 "t1_arm_1"   . . . . . .
    224 "t2_arm_1"   . . 3 1 . .
    224 "t3_arm_1"   . . . . . .
    224 "t4_arm_1"   . . . . 3 2
    225 "t0_arm_1" 225 0 . . . .
    225 "t1_arm_1"   . . . . . .
    225 "t2_arm_1"   . . 2 2 . .
    225 "t3_arm_1"   . . . . . .
    225 "t4_arm_1"   . . . . 1 1
    226 "t0_arm_1" 226 0 . . . .
    226 "t1_arm_1"   . . . . . .
    226 "t2_arm_1"   . . . . . .
    226 "t3_arm_1"   . . . . . .
    226 "t4_arm_1"   . . . . . .
    227 "t0_arm_1" 227 0 . . . .
    227 "t1_arm_1"   . . . . . .
    227 "t2_arm_1"   . . 5 2 . .
    227 "t3_arm_1"   . . . . . .
    227 "t4_arm_1"   . . . . 4 2
    228 "t0_arm_1" 228 0 . . . .
    228 "t1_arm_1"   . . . . . .
    228 "t2_arm_1"   . . . . . .
    228 "t3_arm_1"   . . . . . .
    228 "t4_arm_1"   . . . . . .
    229 "t0_arm_1" 229 0 . . . .
    229 "t1_arm_1"   . . . . . .
    229 "t2_arm_1"   . . 1 1 . .
    229 "t3_arm_1"   . . . . . .
    229 "t4_arm_1"   . . . . 1 1
    230 "t0_arm_1" 230 . . . . .
    230 "t1_arm_1"   . . . . . .
    230 "t2_arm_1"   . . 1 1 . .
    230 "t3_arm_1"   . . . . . .
    230 "t4_arm_1"   . . . . 2 1
    231 "t0_arm_1" 231 . . . . .
    231 "t1_arm_1"   . . . . . .
    231 "t2_arm_1"   . . . . . .
    231 "t3_arm_1"   . . . . . .
    231 "t4_arm_1"   . . . . . .
    232 "t0_arm_1" 232 1 . . . .
    232 "t1_arm_1"   . . . . . .
    232 "t2_arm_1"   . . 2 1 . .
    232 "t3_arm_1"   . . . . . .
    232 "t4_arm_1"   . . . . 2 2
    233 "t0_arm_1" 233 2 . . . .
    233 "t1_arm_1"   . . . . . .
    233 "t2_arm_1"   . . . . . .
    233 "t3_arm_1"   . . . . . .
    233 "t4_arm_1"   . . . . . .
    234 "t0_arm_1" 234 0 . . . .
    234 "t1_arm_1"   . . . . . .
    234 "t2_arm_1"   . . 1 1 . .
    234 "t3_arm_1"   . . . . . .
    234 "t4_arm_1"   . . . . 1 1
    235 "t0_arm_1" 235 0 . . . .
    235 "t1_arm_1"   . . . . . .
    235 "t2_arm_1"   . . 1 1 . .
    235 "t3_arm_1"   . . . . . .
    235 "t4_arm_1"   . . . . 1 1
    236 "t0_arm_1" 236 . . . . .
    236 "t1_arm_1"   . . . . . .
    236 "t2_arm_1"   . . . . . .
    236 "t3_arm_1"   . . . . . .
    236 "t4_arm_1"   . . . . 1 1
    237 "t0_arm_1" 237 2 . . . .
    237 "t1_arm_1"   . . . . . .
    237 "t2_arm_1"   . . . . . .
    237 "t3_arm_1"   . . . . . .
    237 "t4_arm_1"   . . . . . .
    238 "t0_arm_1" 238 2 . . . .
    238 "t1_arm_1"   . . . . . .
    238 "t2_arm_1"   . . 3 3 . .
    238 "t3_arm_1"   . . . . . .
    238 "t4_arm_1"   . . . . 1 1
    239 "t0_arm_1" 239 0 . . . .
    239 "t1_arm_1"   . . . . . .
    239 "t2_arm_1"   . . 1 2 . .
    239 "t3_arm_1"   . . . . . .
    239 "t4_arm_1"   . . . . 1 1
    240 "t0_arm_1" 240 1 . . . .
    240 "t1_arm_1"   . . . . . .
    240 "t2_arm_1"   . . . . . .
    240 "t3_arm_1"   . . . . . .
    240 "t4_arm_1"   . . . . . .
    241 "t0_arm_1" 241 1 . . . .
    241 "t1_arm_1"   . . . . . .
    241 "t2_arm_1"   . . . . . .
    241 "t3_arm_1"   . . . . . .
    241 "t4_arm_1"   . . . . . .
    end
    tempfile dataset2
    save `dataset2'
    
    clear
    use `dataset1'
    //    BUILD A LIST OF THE K10 VARIABLES TO RESHAPE LONG
    local k10s
    forvalues i= 1/2 { // IN REAL DATA USE -forvalues i = 1/10-
        local ii: display %02.0f `i'
        local k10s `k10s' k10_`ii'_t
    }
    
    //    GO LONG AND SET ASIDE
    reshape long `k10s', i(id) j(time)
    tempfile holding
    save `holding'
    
    //    DEAL WITH THE SECOND DATA SET
    use `dataset2', clear
    //     VERIFY THAT EACH OF CRF2_31 AND K10* HAS
    //     ONLY ONE DISTINCT NON-MISSING VALUE PER ID
    foreach v of varlist id crf2_3a k10_* {
        by record_id_t0 (`v'), sort: assert `v' == `v'[1] | missing(`v')
    }
    //    MAKE THIS A TRUE WIDE DATA LAYOUT BY COLLAPSING TO ONE RECORD PER ID
    collapse (firstnm) id crf2_3a k10_*, by(record_id_t0)
    
    //    NOW REHSAPE LONG AND APPEND TO THE HOLDING DATA
    reshape long `k10s', i(id) j(time)
    drop record_id_t0
    
    append using `holding'
    isid id time, sort
    Thank you for using -dataex- on your very first post, and for posing your question clearly.

    Added: It dawns on me that if your data sets are very large, you might prefer to stick with the true wide layouts and -append- those first, following that by a single -reshape- in order to speed things up a bit. That would look like:

    Code:
    clear
    use `dataset1'
    //    BUILD A LIST OF THE K10 VARIABLES TO RESHAPE LONG
    local k10s
    forvalues i= 1/2 { // IN REAL DATA USE -forvalues i = 1/10-
        local ii: display %02.0f `i'
        local k10s `k10s' k10_`ii'_t
    }
    
    
    //    DEAL WITH THE SECOND DATA SET
    use `dataset2', clear
    //     VERIFY THAT EACH OF CRF2_31 AND K10* HAS 
    //     ONLY ONE DISTINCT NON-MISSING VALUE PER ID
    foreach v of varlist id crf2_3a k10_* {
        by record_id_t0 (`v'), sort: assert `v' == `v'[1] | missing(`v')
    }
    //    MAKE THIS A TRUE WIDE DATA LAYOUT BY COLLAPSING TO ONE RECORD PER ID
    collapse (firstnm) id crf2_3a k10_*, by(record_id_t0)
    drop record_id_t0
    
    //    APPEND TO DATASET 1
    append using `dataset1'
    
    //    NOW REHSAPE LONG 
    reshape long `k10s', i(id) j(time)
    
    isid id time, sort
    Last edited by Clyde Schechter; 27 May 2018, 11:20.

    Comment


    • #3
      Hi Clyde,

      Thank you for taking the time to wrangle my data examples, I will attempt these changes asap.
      It did seem as though they were not in true wide or long formats, and things were getting a little complicated. I experimented with plenty of commands but I couldn't quite get it.
      Lots to learn from this.
      This is enormously helpful.


      James

      Comment

      Working...
      X