collapse observation into a single row

federico zampi

Join Date: Sep 2014

Posts: 24
#1

collapse observation into a single row

30 Oct 2014, 05:45

Dear Stata users,

I have the following dataset

input id num1 num2 num3 num4 num5
1 12 1 14 55 15
1 12 1 14 145 15
2 99 9 97 4 96
2 99 9 97 23 96
2 99 9 97 200 96
3 55 3 4 3 22
end

I noted that the dataset contains some duplicate observations in terms of all variables but num4.
Therefore, I would like to collapse these duplicate observations into a single row by summing up the value of num4.

To illustrate the final dataset should consists of three rows as the following:

input id num1 num2 num3 num4 num5
1 12 1 14 200 15
2 99 9 97 227 96
3 55 3 4 3 22
end

I'm not sure what is the quickest way to do this. Can I kindly ask you to help me with this?
Thank you in advance.
Best regards,
F
Tags: None
Rich Goldstein

Join Date: Mar 2014

Posts: 4466
#2

30 Oct 2014, 05:48

sounds like you want -collapse-; use means for variables that do not differ and sum for num4
Comment

Svend Juul

Join Date: Apr 2014
Posts: 515

30 Oct 2014, 06:35

Code:

.  collapse (sum) num4 , by(id-num3 num5)
.  list, clean
       id   num1   num2   num3   num5   num4 
  1.    1     12      1     14     15    200 
  2.    2     99      9     97     96    227 
  3.    3     55      3      4     22      3

Comment

federico zampi

Join Date: Sep 2014

Posts: 24
#4

30 Oct 2014, 06:37

Hello Rich,
Thank you for your replay.

I tried your suggestion and it worked. I used the following: collapse num1 num2 num3 (sum) num4 (mean) num5, by(id).

However, what if I have a string variable in the dataset instead of having all numeric ones?
Let's consider this dataset:

input id name num2 num3 num4 num5
1 a 1 14 55 15
1 a 1 14 145 15
2 b 9 97 4 96
2 b 9 97 23 96
2 b 9 97 200 96
3 b 3 4 3 22
end

How can I merge duplicates into a single row? It appears that collapse does not work.

Thank you for your help. Really appreciated.
F
Comment
federico zampi

Join Date: Sep 2014

Posts: 24
#5

30 Oct 2014, 06:41

I tried the code Svend suggested. The by option solves my problem.
Thank you very much. Really useful and prompt reply.

Thank you and have a lovely day,
F
Comment

Svend Juul

Join Date: Apr 2014
Posts: 515

30 Oct 2014, 06:42

Note that your input command would not work. But here we go:

Code:

 input id str1 name num2 num3 num4 num5
 1 "a" 1 14 55 15
 1 "a" 1 14 145 15
 2 "b" 9 97 4 96
 2 "b" 9 97 23 96
 2 "b" 9 97 200 96
 3 "b" 3 4 3 22
 end

 collapse (sum) num4 , by(id-num3 num5)
.  list, clean
       id   name   num2   num3   num5   num4 
  1.    1      a      1     14     15    200 
  2.    2      b      9     97     96    227 
  3.    3      b      3      4     22      3

Announcement

collapse observation into a single row

Comment

Comment

Comment

Comment

Comment