collapse and at the same time keep other variables

Mario Marra

Join Date: Oct 2018

Posts: 26
#1

collapse and at the same time keep other variables

24 Oct 2018, 08:03

Hi all,

I have a dataset where there is duplicates observation and this has a different price.

Duplicates in terms of station edate

--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 41595 0
2 | 2222 1111
3 | 168 112
4 | 16 12
--------------------------------------

. i.e. I have a variable "price" that is repeated twice in some days (where days is represented by variable "edate" ).
What I want to do. I want to do a mean between this two prices and get just one observation. However, at the same time I want to keep other variables.

I show you a small piece of my dataset

station brand fuel service price edate
1064 Q8 diesel CS 1.719 10sep2012
1064 Q8 diesel CS 1.719 11sep2012
1064 Q8 diesel CS 1.719 12sep2012
1064 Q8 diesel CS 1.719 13sep2012
1064 Q8 diesel CS 1.719 14sep2012
1064 Q8 diesel CS 1.719 15sep2012
1064 Q8 diesel CS 1.719 17sep2012
1064 Q8 diesel CS 1.719 18sep2012
1064 Q8 diesel CS 1.619 21sep2012
1064 Q8 diesel CS 1.719 21sep2012
1064 Q8 diesel CS 1.719 22sep2012
1064 Q8 diesel CS 1.719 24sep2012
1064 Q8 diesel CS 1.719 25sep2012
1064 Q8 diesel CS 1.719 26sep2012
1064 Q8 diesel CS 1.619 27sep2012
1064 Q8 diesel CS 1.719 27sep2012
1064 Q8 diesel CS 1.719 28sep2012
1064 Q8 diesel CS 1.719 29sep2012

I tried with the following command:
.sort edate station
.collapse (mean) price, by (edate station)

and I yield this output:

station edate price
1064 07sep2012 1.749
1064 08sep2012 1.719
1064 10sep2012 1.719
1064 11sep2012 1.719
1064 12sep2012 1.719
1064 13sep2012 1.719
1064 14sep2012 1.719
1064 15sep2012 1.719
1064 17sep2012 1.719
1064 18sep2012 1.719
1064 21sep2012 1.669
1064 22sep2012 1.719
1064 24sep2012 1.719
1064 25sep2012 1.719
1064 26sep2012 1.719
1064 27sep2012 1.669
1064 28sep2012 1.719
1064 29sep2012 1.719

the mean is correct, but I lost others variable (brand, fuel, service)

Can you help me?

I apologize for unclear English.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35703
#2

24 Oct 2018, 08:25

Just put the extra variables inside the by(). That's my guess.
Comment
Mario Marra

Join Date: Oct 2018

Posts: 26
#3

24 Oct 2018, 08:43

Dear Nick,
I tried to do what you suggested to me.
However, this command does not work well.
Nothing has changed
Comment

Giovanni Colitti

Join Date: Feb 2016
Posts: 41

24 Oct 2018, 09:19

-gcollapse- and -duplicates- might accomplish something similar to what you want. I would try something like:

Code:

gen price_mean = price
drop price
gcollapse (mean) price_mean , merge replace by(station edate)
duplicates drop station edate , force

Alternatively, I think -egen- will work here too.

Code:

egen price_mean = mean(price) , by(station edate)
duplicates drop station edate , force

Here is an example using the auto dataset:

Code:

// Generate mean price by mpg & foreign
sysuse auto , clear
sort foreign mpg
gen price_mean = price
gcollapse (mean) price_mean , merge replace by(mpg foreign)

// Drop duplicates in terms of mpg & foreign
duplicates drop mpg foreign , force

// Alternative using egen
sysuse auto , clear
sort foreign mpg
egen price_mean = mean(price) , by(mpg foreign)
duplicates drop mpg foreign , force

Note, if you have a very large dataset (> 1 million obs) I think -gcollapse- offers some speed improvements over -egen-. Might want to try both and see which works best for you.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35703
#5

24 Oct 2018, 09:40

Give us an example dataset using dataex and someone will try to help. Please follow FAQ Advice #12.
Comment
Mario Marra

Join Date: Oct 2018

Posts: 26
#6

24 Oct 2018, 10:09

The first code works well. Really thanks
gen price_mean = price drop price gcollapse (mean) price_mean , merge replace by(station edate) duplicates drop station edate , force
1 like
Comment

Announcement

collapse and at the same time keep other variables

Comment

Comment

Comment

Comment

Comment