Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to create a dummy variable per patient with multiple observations

    Hi,

    I have a dataset with +1000 patients that had multiple blood test regarding there blood glucose level. I want to determine how many of the patients have had ≥1 hypoglycaemic episode (blood glucose below 2.6). A patient can have multiple blood test (up to 48) and likewise possibly have multiple hypoglycaemic episode, though I want to create a variable that determines if they had a hypoglycaemic episode or not. I thought about a dummy variable but can't figure out how to approach it at the moment. Below is a data example where a variable should show that patient 1, 3 and 4 had a minimum of one blood test < 2.6 and patient 2 did not.
    Patient Blood_test1 Blood_test2 Blood_test3 Blood_test4 Blood_test5
    1 2.5 2.7 1.6 2.6 3.0
    2 2.8 2.9 2.7
    3 1.7 1.3 2.5 2.8
    4 1.4 2.7 2.7
    I'm sorry if it's an elementary question and would appreciate any kind of help.

    Many thanks in advance,
    Kristoffer



  • #2
    There is a small precision issue if your blood_test? variables are float (as you didn't use dataex, we can't be sure).


    Code:
    * Example generated by -dataex-. For more info, type help dataex
    clear
    input byte patient float(blood_test1 blood_test2 blood_test3 blood_test4) byte blood_test5
    1 2.5 2.7 1.6 2.6 3
    2 2.8 2.9 2.7   . .
    3 1.7 1.3 2.5 2.8 .
    4 1.4 2.7 2.7   . .
    end
    
    . gen islow = 0
    
    forval j = 1/5 {
          replace islow = islow + (blood_test`j' < float(2.6))
    }
    
    . l
    
         +------------------------------------------------------------------------+
         | patient   blood_~1   blood_~2   blood_~3   blood_~4   blood_~5   islow |
         |------------------------------------------------------------------------|
      1. |       1        2.5        2.7        1.6        2.6          3       2 |
      2. |       2        2.8        2.9        2.7          .          .       0 |
      3. |       3        1.7        1.3        2.5        2.8          .       3 |
      4. |       4        1.4        2.7        2.7          .          .       1 |
         +------------------------------------------------------------------------+
    If your blood variables are double, just use 2.6 not float(2.6).

    This code counts but an indicator variable can just be created by

    Code:
    gen ind_low = is_low > 0

    Comment


    • #3
      Another approach uses the community-contributed extensions to the functions accepted by the egen command. You can install these using
      Code:
      ssc install egenmore
      Then, the code to generate the variable ind_low in #2 is just one line:

      Code:
      clear
      input byte patient float(blood_test1 blood_test2 blood_test3 blood_test4) byte blood_test5
      1 2.5 2.7 1.6 2.6 3
      2 2.8 2.9 2.7   . .
      3 1.7 1.3 2.5 2.8 .
      4 1.4 2.7 2.7   . .
      end
      
      egen byte ind_low = rany(blood_test*), cond(@<float(2.6))
      If you want the count (i.e. the variable is_low in #2), just substitute rcount for rany in the code above.

      Comment


      • #4
        Thanks to Hemanshu Kumar for the mention of egenmore. In turn I want to flag a comment in the help file:

        From Stata 7, foreach provides an alternative that would now be considered better style:
        which was written by me and was a comment on (in this case) my own additions to egen.

        Naturally, anyone is allowed to disagree on style and/or to use the code in question if they prefer!

        Comment

        Working...
        X