Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • collapse and at the same time keep other variables

    Hi all,

    I have a dataset where there is duplicates observation and this has a different price.

    Duplicates in terms of station edate

    --------------------------------------
    copies | observations surplus
    ----------+---------------------------
    1 | 41595 0
    2 | 2222 1111
    3 | 168 112
    4 | 16 12
    --------------------------------------

    . i.e. I have a variable "price" that is repeated twice in some days (where days is represented by variable "edate" ).
    What I want to do. I want to do a mean between this two prices and get just one observation. However, at the same time I want to keep other variables.

    I show you a small piece of my dataset

    station brand fuel service price edate
    1064 Q8 diesel CS 1.719 10sep2012
    1064 Q8 diesel CS 1.719 11sep2012
    1064 Q8 diesel CS 1.719 12sep2012
    1064 Q8 diesel CS 1.719 13sep2012
    1064 Q8 diesel CS 1.719 14sep2012
    1064 Q8 diesel CS 1.719 15sep2012
    1064 Q8 diesel CS 1.719 17sep2012
    1064 Q8 diesel CS 1.719 18sep2012
    1064 Q8 diesel CS 1.619 21sep2012
    1064 Q8 diesel CS 1.719 21sep2012

    1064 Q8 diesel CS 1.719 22sep2012
    1064 Q8 diesel CS 1.719 24sep2012
    1064 Q8 diesel CS 1.719 25sep2012
    1064 Q8 diesel CS 1.719 26sep2012
    1064 Q8 diesel CS 1.619 27sep2012
    1064 Q8 diesel CS 1.719 27sep2012

    1064 Q8 diesel CS 1.719 28sep2012
    1064 Q8 diesel CS 1.719 29sep2012

    I tried with the following command:
    .sort edate station
    .collapse (mean) price, by (edate station)

    and I yield this output:

    station edate price
    1064 07sep2012 1.749
    1064 08sep2012 1.719
    1064 10sep2012 1.719
    1064 11sep2012 1.719
    1064 12sep2012 1.719
    1064 13sep2012 1.719
    1064 14sep2012 1.719
    1064 15sep2012 1.719
    1064 17sep2012 1.719
    1064 18sep2012 1.719
    1064 21sep2012 1.669
    1064 22sep2012 1.719
    1064 24sep2012 1.719
    1064 25sep2012 1.719
    1064 26sep2012 1.719
    1064 27sep2012 1.669
    1064 28sep2012 1.719
    1064 29sep2012 1.719

    the mean is correct, but I lost others variable (brand, fuel, service)

    Can you help me?

    I apologize for unclear English.

  • #2
    Just put the extra variables inside the by(). That's my guess.

    Comment


    • #3
      Dear Nick,
      I tried to do what you suggested to me.
      However, this command does not work well.
      Nothing has changed

      Comment


      • #4
        -gcollapse- and -duplicates- might accomplish something similar to what you want. I would try something like:

        Code:
        gen price_mean = price
        drop price
        gcollapse (mean) price_mean , merge replace by(station edate)
        duplicates drop station edate , force
        Alternatively, I think -egen- will work here too.

        Code:
        egen price_mean = mean(price) , by(station edate)
        duplicates drop station edate , force
        Here is an example using the auto dataset:

        Code:
        // Generate mean price by mpg & foreign
        sysuse auto , clear
        sort foreign mpg
        gen price_mean = price
        gcollapse (mean) price_mean , merge replace by(mpg foreign)
        
        // Drop duplicates in terms of mpg & foreign
        duplicates drop mpg foreign , force
        
        // Alternative using egen
        sysuse auto , clear
        sort foreign mpg
        egen price_mean = mean(price) , by(mpg foreign)
        duplicates drop mpg foreign , force
        Note, if you have a very large dataset (> 1 million obs) I think -gcollapse- offers some speed improvements over -egen-. Might want to try both and see which works best for you.


        Comment


        • #5
          Give us an example dataset using dataex and someone will try to help. Please follow FAQ Advice #12.

          Comment


          • #6
            The first code works well. Really thanks
            gen price_mean = price drop price gcollapse (mean) price_mean , merge replace by(station edate) duplicates drop station edate , force

            Comment

            Working...
            X