Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Generating a single id row from multiple rows/observations

    Dear Stata users,

    I am analyzing a dataset of the following type that includes patient’s co-morbidities such as asthma, hypertension, diabetes etc.

    Regnumber asthma hypertension diabetes
    1001 0 1 0
    1001 0 0 0
    1002 0 1 0
    1002 0 0 0
    1002 0 0 0
    1004 0 0 0
    1005 0 1 0
    1005 0 0 0
    1009 0 0 1
    1009 0 0 0

    Please note that this is not a longitudinal data set rather a multiple response data set where a person may report more than one diseases. I need to generate a new data set that only includes a single id row for each subject but explicitly showing if a person has a particular morbidity or not (for example, has hypertension or not). So, the data set I am looking for would look like the following:

    Regnumber asthma hypertension diabetes
    1001 0 1 0
    1002 0 1 0
    1004 0 0 0
    1005 0 1 0
    1009 0 0 1

    Please give some advice on how I could generate the above data set.

    Thanks in advance !!!

  • #2
    Code:
    clear
    input int Regnumber asthma hypertension diabetes
    1001 0 1 0
    1001 0 0 0
    1002 0 1 0
    1002 0 0 0
    1002 0 0 0
    1004 0 0 0
    1005 0 1 0
    1005 0 0 0
    1009 0 0 1
    1009 0 0 0
    end
    collapse (sum) asthma hypertension diabetes, by(Regnumber)
    This is the result.
    Code:
    . list, noobs sep(0)
    
      +-----------------------------------------+
      | Regnum~r   asthma   hypert~n   diabetes |
      |-----------------------------------------|
      |     1001        0          1          0 |
      |     1002        0          1          0 |
      |     1004        0          0          0 |
      |     1005        0          1          0 |
      |     1009        0          0          1 |
      +-----------------------------------------+

    Comment


    • #3
      To Friedrich's excellent solution, I would only like to make sure you understand that, if it is possible for a patient to have a particular co-morbidity mentioned more than once (this doesn't happen in your sample data), you might prefer to substitute "(max)" for "(sum)" in the collapse command, so that your results will be 0/1 values as your example pictured them, rather than 0/1/2/... .

      Comment


      • #4
        Thank you very much Friedrich and William. Your suggestions worked..

        Comment

        Working...
        X