Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Convert multiple similar variables into one variable with multiple observations (rows)

    Hello, I am very very to Stata and trying to teach myself but I am stuck on this issue. I don't know the exact terminology so my apologies if I mixed up some of my words.

    Essentially, I have a row for every hospital visit for each patient (could be anywhere from 1-100 rows). At it's most basic, each row has variables for patient ID, all outcomes (var1, var2, var3,...,var25), and the date of the hospital visit (as days from enrollment).

    The outcomes are all coded and are string variables.

    Code:
      
    ID outcome1 outcome2 outcome3 outcome4 outcomeN date visit
    001 X1 X2 X3 5 1
    001 X4 X5 45 2
    001 X1 X2 X5 X6 X7 104 3
    001 X1 200 4
    001 X6 302 5
    001 X1 X2 333 6
    002 X4 32 1
    002 X7 100 2
    002 X1 X2 121 3
    My question is if there is a somewhat automated way to break down each row into a separate row for each outcome under one outcome variable, maintaining the ID, the outcome variable string, and the date associated.

    So using the first row of the example above:

    Code:
    Starts as:
      
    ID outcome1 outcome2 outcome3 outcome4 outcomeN date visit
    001 X1 X2 X3 5 1
    Convert to:
    ID outcome date visit
    001 X1 5 1
    001 X2 5 1
    001 X3 5 1
    I surely can long-code this, but as noted there are as many as 25 outcome variables for each now, and any number of visits for each patient, so I am hoping there is a loop or script that can automate this.

    Apologies, for the long post, I hope this makes sense.
    Last edited by Nate Fitzpatrick; 03 Nov 2021, 08:46.

  • #2
    Nate:
    see -reshape-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Following on Carlo's advice, here's an example using your example data.
      Code:
      * Example generated by -dataex-. For more info, type help dataex
      clear
      input str3 id str2(outcome1 outcome2 outcome3 outcome4 outcome5) int date byte visit
      "001" "X1" "X2" "X3" ""   ""     5 1
      "001" "X4" "X5" ""   ""   ""    45 2
      "001" "X1" "X2" "X5" "X6" "X7" 104 3
      "001" "X1" ""   ""   ""   ""   200 4
      "001" "X6" ""   ""   ""   ""   302 5
      "001" "X1" "X2" ""   ""   ""   333 6
      "002" "X4" ""   ""   ""   ""    32 1
      "002" "X7" ""   ""   ""   ""   100 2
      "002" "X1" "X2" ""   ""   ""   121 3
      end
      reshape long outcome, i(id date visit) j(ocnum)
      drop if missing(outcome)
      list, sepby(id)
      Code:
      . reshape long outcome, i(id date visit) j(ocnum)
      (j = 1 2 3 4 5)
      
      Data                               Wide   ->   Long
      -----------------------------------------------------------------------------
      Number of observations                9   ->   45          
      Number of variables                   8   ->   5           
      j variable (5 values)                     ->   ocnum
      xij variables:
               outcome1 outcome2 ... outcome5   ->   outcome
      -----------------------------------------------------------------------------
      
      . drop if missing(outcome)
      (27 observations deleted)
      
      . list, sepby(id)
      
           +--------------------------------------+
           |  id   date   visit   ocnum   outcome |
           |--------------------------------------|
        1. | 001      5       1       1        X1 |
        2. | 001      5       1       2        X2 |
        3. | 001      5       1       3        X3 |
        4. | 001     45       2       1        X4 |
        5. | 001     45       2       2        X5 |
        6. | 001    104       3       1        X1 |
        7. | 001    104       3       2        X2 |
        8. | 001    104       3       3        X5 |
        9. | 001    104       3       4        X6 |
       10. | 001    104       3       5        X7 |
       11. | 001    200       4       1        X1 |
       12. | 001    302       5       1        X6 |
       13. | 001    333       6       1        X1 |
       14. | 001    333       6       2        X2 |
           |--------------------------------------|
       15. | 002     32       1       1        X4 |
       16. | 002    100       2       1        X7 |
       17. | 002    121       3       1        X1 |
       18. | 002    121       3       2        X2 |
           +--------------------------------------+
      Please note the use of the output of the dataex command to present example data. With realistic example data presented with dataex this example would have been more closely tailored to your actual data.

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Nate:
        see -reshape-.
        Thank you Carlo! I took some time to read through the Stata manual for reshape, and specifically for the example above I worked out:
        Code:
        rename outcomen outcome5
        
        reshape long outcome, i(id date visit) j(outcomenumber)
        
        drop if missing(outcome)
        Which seems to get me exactly what I want:
        Code:
         
        id date visit outcomenumber outcome
        1 5 1 1 X1
        1 5 1 2 X2
        1 5 1 3 X3
        1 45 2 1 X4
        1 45 2 2 X5
        1 104 3 1 X1
        1 104 3 2 X2
        1 104 3 3 X5
        1 104 3 4 X6
        1 104 3 5 X7
        1 200 4 1 X1
        1 302 5 1 X6
        1 333 6 1 X1
        1 333 6 2 X2
        2 32 1 1 X4
        2 100 2 1 X7
        2 121 3 1 X1
        2 121 3 2 X2
        Which is perfect, but I have 1 question about the code to be clarified, if you have a moment.

        The manual states that 'i( )' is the unique identifier for the record in wide format. I am assuming that this means that any and all variables included in the 'i( )' will be maintained based on their correspondence with each observation in the variable to be reshaped (in this case "outcome")? And the 'j' is simply the new "long" variable to take the place of the multiple repeated "wide" variables?

        Thank you again!

        Comment

        Working...
        X