Hi all,
I have a dataset where there is duplicates observation and this has a different price.
Duplicates in terms of station edate
--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 41595 0
2 | 2222 1111
3 | 168 112
4 | 16 12
--------------------------------------
. i.e. I have a variable "price" that is repeated twice in some days (where days is represented by variable "edate" ).
What I want to do. I want to do a mean between this two prices and get just one observation. However, at the same time I want to keep other variables.
I show you a small piece of my dataset
station brand fuel service price edate
1064 Q8 diesel CS 1.719 10sep2012
1064 Q8 diesel CS 1.719 11sep2012
1064 Q8 diesel CS 1.719 12sep2012
1064 Q8 diesel CS 1.719 13sep2012
1064 Q8 diesel CS 1.719 14sep2012
1064 Q8 diesel CS 1.719 15sep2012
1064 Q8 diesel CS 1.719 17sep2012
1064 Q8 diesel CS 1.719 18sep2012
1064 Q8 diesel CS 1.619 21sep2012
1064 Q8 diesel CS 1.719 21sep2012
1064 Q8 diesel CS 1.719 22sep2012
1064 Q8 diesel CS 1.719 24sep2012
1064 Q8 diesel CS 1.719 25sep2012
1064 Q8 diesel CS 1.719 26sep2012
1064 Q8 diesel CS 1.619 27sep2012
1064 Q8 diesel CS 1.719 27sep2012
1064 Q8 diesel CS 1.719 28sep2012
1064 Q8 diesel CS 1.719 29sep2012
I tried with the following command:
.sort edate station
.collapse (mean) price, by (edate station)
and I yield this output:
station edate price
1064 07sep2012 1.749
1064 08sep2012 1.719
1064 10sep2012 1.719
1064 11sep2012 1.719
1064 12sep2012 1.719
1064 13sep2012 1.719
1064 14sep2012 1.719
1064 15sep2012 1.719
1064 17sep2012 1.719
1064 18sep2012 1.719
1064 21sep2012 1.669
1064 22sep2012 1.719
1064 24sep2012 1.719
1064 25sep2012 1.719
1064 26sep2012 1.719
1064 27sep2012 1.669
1064 28sep2012 1.719
1064 29sep2012 1.719
the mean is correct, but I lost others variable (brand, fuel, service)
Can you help me?
I apologize for unclear English.
I have a dataset where there is duplicates observation and this has a different price.
Duplicates in terms of station edate
--------------------------------------
copies | observations surplus
----------+---------------------------
1 | 41595 0
2 | 2222 1111
3 | 168 112
4 | 16 12
--------------------------------------
. i.e. I have a variable "price" that is repeated twice in some days (where days is represented by variable "edate" ).
What I want to do. I want to do a mean between this two prices and get just one observation. However, at the same time I want to keep other variables.
I show you a small piece of my dataset
station brand fuel service price edate
1064 Q8 diesel CS 1.719 10sep2012
1064 Q8 diesel CS 1.719 11sep2012
1064 Q8 diesel CS 1.719 12sep2012
1064 Q8 diesel CS 1.719 13sep2012
1064 Q8 diesel CS 1.719 14sep2012
1064 Q8 diesel CS 1.719 15sep2012
1064 Q8 diesel CS 1.719 17sep2012
1064 Q8 diesel CS 1.719 18sep2012
1064 Q8 diesel CS 1.619 21sep2012
1064 Q8 diesel CS 1.719 21sep2012
1064 Q8 diesel CS 1.719 22sep2012
1064 Q8 diesel CS 1.719 24sep2012
1064 Q8 diesel CS 1.719 25sep2012
1064 Q8 diesel CS 1.719 26sep2012
1064 Q8 diesel CS 1.619 27sep2012
1064 Q8 diesel CS 1.719 27sep2012
1064 Q8 diesel CS 1.719 28sep2012
1064 Q8 diesel CS 1.719 29sep2012
I tried with the following command:
.sort edate station
.collapse (mean) price, by (edate station)
and I yield this output:
station edate price
1064 07sep2012 1.749
1064 08sep2012 1.719
1064 10sep2012 1.719
1064 11sep2012 1.719
1064 12sep2012 1.719
1064 13sep2012 1.719
1064 14sep2012 1.719
1064 15sep2012 1.719
1064 17sep2012 1.719
1064 18sep2012 1.719
1064 21sep2012 1.669
1064 22sep2012 1.719
1064 24sep2012 1.719
1064 25sep2012 1.719
1064 26sep2012 1.719
1064 27sep2012 1.669
1064 28sep2012 1.719
1064 29sep2012 1.719
the mean is correct, but I lost others variable (brand, fuel, service)
Can you help me?
I apologize for unclear English.
Comment