Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating a comorbidity variable

    Hi everyone,

    This might be a simple question. I am currently trying to create a cardiovascular comorbidity variable (which encompasses various heart problems). However, the final variable created does not seem to encompass what I want it to capture (i.e., it seems to be missing some of the cases). Specifically, I wanted to create a 'heart_problems' variable, which would equal '1' for all cases where the participants endorse at least one of the cardiovascular problems (i.e., cm029 or cm038 or cm033 or cm034 or cm041 ) and 0 otherwise.

    Here is the code I used (where 2 here represents "No" and 1 represents "Yes":
    Code:
     generate heart_problems = .
            replace heart_problems = 0 if cm029_==2 |  cm038_==2 | cm033_==2 |cm034_==2 | cm041_==2
            replace heart_problems = 1 if cm029_==1 |  cm038_==1 | cm033_==1 |cm034_==1 | cm041_==1
    Here is a sample of my dataset
    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float newid int(cm029 cm038 cm033 cm034 cm041)
     1 2 2 2 2 2
     1 2 2 . . 2
     1 2 2 . . 2
     2 2 2 2 2 2
     2 2 2 . . 2
     2 2 2 . . 2
     3 2 2 2 2 2
     3 2 2 . . 2
     3 2 2 . . 2
     4 2 2 2 2 2
     4 . . . . .
     4 . . . . .
     5 2 2 2 2 2
     5 . . . . .
     5 . . . . .
     6 2 2 2 2 2
     6 2 2 . . 2
     6 . . . . .
     7 2 2 2 2 2
     7 2 2 . . 2
     7 . . . . .
     8 2 2 2 2 2
     8 . . . . .
     8 2 2 . . 2
     9 2 2 2 2 2
     9 . . . . .
     9 . . . . .
    10 2 2 2 2 2
    10 2 2 . . 2
    10 2 2 . . 2
    11 2 2 2 2 2
    11 2 2 . . 2
    11 2 2 . . 2
    12 2 2 2 2 2
    12 2 2 . . 2
    12 2 2 . . 2
    13 2 2 2 2 2
    13 2 2 . . 2
    13 2 2 . . 2
    14 2 2 2 2 2
    14 2 2 . . 2
    14 2 2 . . 2
    15 2 2 2 2 2
    15 2 2 . . 2
    15 2 2 . . 2
    16 2 2 2 2 2
    16 2 2 . . 2
    16 2 2 . . 2
    17 2 2 2 2 2
    17 2 2 . . 2
    17 . . . . .
    18 2 2 1 1 2
    18 2 2 2 1 2
    18 2 2 . . 2
    19 2 2 2 2 2
    19 2 2 . . 2
    19 2 2 . . 2
    20 2 2 2 2 2
    20 2 2 . . 2
    20 2 2 . . 2
    21 2 2 2 2 2
    21 2 2 . . 2
    21 2 1 . . 1
    22 2 2 2 2 2
    22 2 2 . . 2
    22 2 2 . . 2
    23 2 2 2 2 2
    23 2 2 . . 2
    23 2 2 . . 2
    24 2 2 1 1 2
    24 2 2 . . 2
    24 2 2 . . 2
    25 2 2 2 2 2
    25 2 2 . . 2
    25 2 2 . . 2
    26 2 2 1 1 2
    26 2 2 . . 2
    26 2 2 . . 2
    27 2 2 2 2 2
    27 2 2 . . 2
    27 2 2 . . 2
    28 2 2 2 2 2
    28 2 2 . . 2
    28 2 2 . . 2
    29 2 2 2 2 2
    29 2 2 . . 2
    29 2 2 . . 2
    30 2 2 2 2 2
    30 . . . . .
    30 . . . . .
    31 2 2 2 2 2
    31 . . . . .
    31 2 2 . . 2
    32 2 2 2 2 2
    32 2 2 . . 2
    32 2 2 . . 2
    33 2 2 1 1 2
    33 2 2 . . 2
    33 2 2 . . 2
    34 2 2 2 2 2
    end
    label values cm029_ _vl1483
    label def _vl1483 2 "2 No", modify
    label values cm038_ _vl1510
    label def _vl1510 1 "1 Yes", modify
    label def _vl1510 2 "2 No", modify
    label values cm033_ _vl1487
    label def _vl1487 1 "1 Yes", modify
    label def _vl1487 2 "2 No", modify
    label values cm034_ _vl1488
    label def _vl1488 1 "1 Yes", modify
    label def _vl1488 2 "2 No", modify
    label values cm041_ _vl1513
    label def _vl1513 1 "1 Yes", modify
    label def _vl1513 2 "2 No", modify

    Any help would be greatly appreciated!

  • #2
    Something is wrong with your -dataex- output. The variables it creates have no _ at the end, whereas all of the labeling commands and your subsequent commands do. -dataex- has been around long enough that I doubt that it has a bug. I suspect that you modified your -dataex- output before pasting it here. Please never do that. In this case, the way to fix it is obvious, but in other circumstances edits can make important changes to the resulting data set that would cause it to behave differently from your real data set, leading you to get unworkable solution proposals to your questions.

    I don't see what is wrong with your code given your description of what you want to do. You say you are missing some cases. Can you point out a case in your example data that your code misses? If not, please post back with a new data example that does contain a missed case.

    That said, although I believe your code produces correct results, it is not how I would go about it. First, Stata is much easier to use if yes/no variables are coded 1 = Yes 0 = No. So before I did anything else, I would recode all of those variables in that way:
    Code:
    recode cm029_ cm038_ cm034_ cm041_ (2 = 0)
    label define boolean  0 "No" 1 "Yes"
    label values cm029_ cm038_ cm034_ cm041_ boolean
    Then I would code:
    Code:
    egen heart_problems = rowmax(cm029_ cm038_ cm034_ cm04_)
    This will set heart_problems to 1 if any of the four variables is coded 1 (Yes). It will be coded missing value if all of the variables have missing value responses. And it will be coded 0 if there are no 1 responses, and at least one 0 (No) response.

    In truth, I would go a bit farther than that. If a person responds, let us say, No to two of these variables and gives no response to the other two, the code so far (yours and mine both) will record a No for heart problems. But really, we don't know if that person has heart problems or not, because of the missing values on two of the variables. I would ideally want such a situation to give heart problems a missing value. There is a trick that makes this fairly easy to do. Instead of the code I suggested above:
    Code:
    recode cm029_ cm038_ cm034_ cm041_ (. = 0.5)
    egen heart_problems = rowmax(cm029_ cm038_ cm034_ cm041_)
    recode cm029_ cm038_ cm034_ cm041_ (.5 = .)

    Comment


    • #3
      Thank you so very much; this worked perfectly!

      Apologies for the dataex confusion. I was going for a cleaner look. Will ensure to keep them consistent going forward. Thank you again.

      Comment

      Working...
      X