Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating multiple new variable columns from an existing

    i have a dataset example below

    id visit_number date value test
    1 2 2020-10-01 16:00 140 sodium
    79 34 2011-07-10 09:15 . xray

    in these columns id is a patient unique identifier, visit_number is a visit unique identifier, data: date of visit ; test - has up to 65 different test names (coded as numbers) and value is the result of the test.

    few qs:

    1. i want to creat a unique identifier combining id and visit number, say A B to AB
    2. how can i create new variable columns from test using the test label using a foreach loop since there are 65 different variables in that column. i have created numeric labels for each test

    many thanks, hal



  • #2
    i tried

    foreach i of var test {
    bysort test: gen new`i'=value
    }

    but this generates just a full new column of value
    the same if i replace new' with new`var'
    Last edited by Hal Mangat; 14 Mar 2023, 23:17.

    Comment


    • #3
      To combine id and visit_number, you could, among other things, do this:
      Code:
      egen id_combined = concat(id visit_number)
      Giving a good answer to 2) would depend on us understanding the exact nature of your data. Consequently, you're much more likely to get a helpful response if you present a data example using the -dataex- command, as described in Section 12 of the FAQ (tab at the top of the screen) that new StataList members are asked to read. (Among other things, you say test has numeric values but show text, presumably because you have chosen to show the labels rather than the actual values.)

      I have difficulty understanding what you mean in #2, given some non-standard use of terminology. Do you mean that for each observation, you want a series of 65 indicator variables (0/1) to indicate which one of the possible tests was done?

      Both of the following being said: There's a reasonable chance that what you want to eventually do with your data doesn't require the variables you want to create here, so knowing something about how you want to use these variables might lead to a helpful answer in a different way.

      Comment


      • #4
        About creating a unique identifier from multiple variables also see

        Code:
        egen id_combined = group(id visit_number), label
        The second question I did not understand.

        Comment


        • #5
          Apologies, here is a sample of the data. in the column variable are multiple (lab test )variables with variable index being their numeric code, eval the value of the test variable followed by units. i have removed id column - all observations are linked to individuals.
          i have 62 different lab tests all in 1 column called variable, and want to create new variable columns for each test using a loop.



          start_date str66 variable byte variable_index float nval str14 units
          "23/01/2012 14:00" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"
          "16/01/2012 20:45" "CREATININE (#2009)" 7 .6 "MG/DL"
          "26/01/2012 04:30" "SODIUM (#2000)" 46 135 "MMOL/L"
          "15/01/2012 03:15" "CREATININE (#2009)" 7 .75 "MG/DL"
          "17/01/2012 04:00" "CREATININE (#2009)" 7 .77 "MG/DL"
          "24/01/2012 03:53" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"
          "22/01/2012 22:10" "SODIUM (#2000)" 46 134 "MMOL/L"
          "25/01/2012 03:15" "CREATININE (#2009)" 7 .72 "MG/DL"
          "25/01/2012 02:35" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"
          "23/01/2012 03:05" "CREATININE (#2009)" 7 .64 "MG/DL"
          "15/01/2012 02:54" "Intensive Care Unit (ICU)" 28 . ""
          "19/01/2012 03:30" "CREATININE (#2009)" 7 .72 "MG/DL"
          "18/01/2012 04:05" "CREATININE (#2009)" 7 .89 "MG/DL"
          "21/01/2012 16:00" "SODIUM (#2000)" 46 138 "MMOL/L"
          "28/01/2012 03:55" "CREATININE (#2009)" 7 .69 "MG/DL"
          "20/01/2012 23:00" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"
          "17/01/2012 10:00" "CHOLESTEROL (#2318)" 5 150 "MG/DL"
          "23/01/2012 23:30" "SODIUM (#2000)" 46 137 "MMOL/L"
          "21/01/2012 11:00" "SODIUM (#2000)" 46 134 "MMOL/L"
          "17/01/2012 10:00" "LDL (#2321)" 29 85 "MG/DL"
          "20/01/2012 03:00" "CREATININE (#2009)" 7 .68 "MG/DL"
          "22/01/2012 09:00" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"
          "21/01/2012 03:00" "SODIUM (#2000)" 46 135 "MMOL/L"
          "20/01/2012 22:00" "SODIUM (#2000)" 46 133 "MMOL/L"

          Comment


          • #6
            I believe the following does what I'm guessing you want. Had you answered the questions I asked, there's some chance I'd have given you a more useful answer, most notably one that gave you a data layout that is optimal for your eventual analysis of this data. In that regard, excluding from your example the id and visit number was likely an unhelpful choice, as it prevented us from seeing whether there are multiple observations for an individual, which might matter for understanding your data.

            Code:
            * Example generated by -dataex-. To install: ssc install dataex
            clear
            input str19 start_date str66 variable byte variable_index float nval str14 units
            "23/01/2012 14:00" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"    
            "16/01/2012 20:45" "CREATININE (#2009)"                7    .6 "MG/DL"
            "26/01/2012 04:30" "SODIUM (#2000)"                   46   135 "MMOL/L"
            "15/01/2012 03:15" "CREATININE (#2009)"                7   .75 "MG/DL"
            "17/01/2012 04:00" "CREATININE (#2009)"                7   .77 "MG/DL"
            "24/01/2012 03:53" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"    
            "22/01/2012 22:10" "SODIUM (#2000)"                   46   134 "MMOL/L"
            "25/01/2012 03:15" "CREATININE (#2009)"                7   .72 "MG/DL"
            "25/01/2012 02:35" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"    
            "23/01/2012 03:05" "CREATININE (#2009)"                7   .64 "MG/DL"
            "15/01/2012 02:54" "Intensive Care Unit (ICU)"        28     . ""      
            "19/01/2012 03:30" "CREATININE (#2009)"                7   .72 "MG/DL"
            "18/01/2012 04:05" "CREATININE (#2009)"                7   .89 "MG/DL"
            "21/01/2012 16:00" "SODIUM (#2000)"                   46   138 "MMOL/L"
            "28/01/2012 03:55" "CREATININE (#2009)"                7   .69 "MG/DL"
            "20/01/2012 23:00" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"    
            "17/01/2012 10:00" "CHOLESTEROL (#2318)"               5   150 "MG/DL"
            "23/01/2012 23:30" "SODIUM (#2000)"                   46   137 "MMOL/L"
            "21/01/2012 11:00" "SODIUM (#2000)"                   46   134 "MMOL/L"
            "17/01/2012 10:00" "LDL (#2321)"                      29    85 "MG/DL"
            "20/01/2012 03:00" "CREATININE (#2009)"                7   .68 "MG/DL"
            "22/01/2012 09:00" "SODIUM CHLORIDE 4 MEQ/ML IV SOLP" 48 85.47 "mL"    
            "21/01/2012 03:00" "SODIUM (#2000)"                   46   135 "MMOL/L"
            "20/01/2012 22:00" "SODIUM (#2000)"                   46   133 "MMOL/L"
            end
            //
            // For each observed value of the variable "variable_index," create
            // an indicator variable "index*" with that value as a suffix
            levelsof variable_index, local(ivals)
            foreach v of local ivals {
               gen byte index`v' = (variable_index == `v')
            }
            // Use the contents of the variables called "variable" and "variable_index" to make variable labels.
            // Looping over observations, though not very Stata-ish, is convenient here.
            forval i = 1/`=_N' {
              local v = variable_index[`i']
              local contents = variable[`i']
              label var index`v' "`contents'"
            }
            // Check out the results.
            desc

            Comment


            • #7
              thanks; id & visit number are identifiers i can't include. there are multiple observations per patient for each test and for different tests

              Comment

              Working...
              X