Generating a single id row from multiple rows/observations

Tanvir Hasan

Join Date: Jun 2014

Posts: 5
#1

Generating a single id row from multiple rows/observations

15 Dec 2015, 07:13

Dear Stata users,

I am analyzing a dataset of the following type that includes patient’s co-morbidities such as asthma, hypertension, diabetes etc.

Regnumber asthma hypertension diabetes
1001 0 1 0
1001 0 0 0
1002 0 1 0
1002 0 0 0
1002 0 0 0
1004 0 0 0
1005 0 1 0
1005 0 0 0
1009 0 0 1
1009 0 0 0

Please note that this is not a longitudinal data set rather a multiple response data set where a person may report more than one diseases. I need to generate a new data set that only includes a single id row for each subject but explicitly showing if a person has a particular morbidity or not (for example, has hypertension or not). So, the data set I am looking for would look like the following:

Regnumber asthma hypertension diabetes
1001 0 1 0
1002 0 1 0
1004 0 0 0
1005 0 1 0
1009 0 0 1

Please give some advice on how I could generate the above data set.

Thanks in advance !!!
Tags: None

Friedrich Huebler

Join Date: Apr 2014
Posts: 1053

15 Dec 2015, 07:26

Code:

clear
input int Regnumber asthma hypertension diabetes
1001 0 1 0
1001 0 0 0
1002 0 1 0
1002 0 0 0
1002 0 0 0
1004 0 0 0
1005 0 1 0
1005 0 0 0
1009 0 0 1
1009 0 0 0
end
collapse (sum) asthma hypertension diabetes, by(Regnumber)

This is the result.

Code:

. list, noobs sep(0)

  +-----------------------------------------+
  | Regnum~r   asthma   hypert~n   diabetes |
  |-----------------------------------------|
  |     1001        0          1          0 |
  |     1002        0          1          0 |
  |     1004        0          0          0 |
  |     1005        0          1          0 |
  |     1009        0          0          1 |
  +-----------------------------------------+

Comment

William Lisowski

Join Date: Dec 2014

Posts: 10150
#3

15 Dec 2015, 11:08

To Friedrich's excellent solution, I would only like to make sure you understand that, if it is possible for a patient to have a particular co-morbidity mentioned more than once (this doesn't happen in your sample data), you might prefer to substitute "(max)" for "(sum)" in the collapse command, so that your results will be 0/1 values as your example pictured them, rather than 0/1/2/... .
1 like
Comment
Tanvir Hasan

Join Date: Jun 2014

Posts: 5
#4

15 Dec 2015, 18:13

Thank you very much Friedrich and William. Your suggestions worked..
Comment

Announcement

Generating a single id row from multiple rows/observations

Comment

Comment

Comment