Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen by is equivalent to bysort egen?

    Hi, I am confused if egen by is equivalent to bysort egen:

    For example, I have a panel data for multiple years and I want to get the mean of income for each individuals across waves

    I have two command:

    1. bysort id: egen incomeMean = mean(Income)
    2. egen incomeMean = mean(Income), by(id)

    Are these two commands produce the same thing? I tried in my own sample and it showed that the two results are the same, but I just want to make sure these two are really the same.

  • #2
    They are not exactly the same.

    The first sorts your data and then calls up egen while the second leaves your data as they were when you called egen, so it sorts internally in applying the by() option, but then leaves the dataset in the sort order it found it.

    This sequence shows the possible difference:

    Code:
    . sysuse auto, clear
    (1978 Automobile Data)
    
    . d
    
    Contains data from C:\Program Files (x86)\Stata\ado\base/a/auto.dta
      obs:            74                          1978 Automobile Data
     vars:            12                          13 Apr 2018 17:45
                                                  (_dta has notes)
    --------------------------------------------------------------------------------------------------------------
                  storage   display    value
    variable name   type    format     label      variable label
    --------------------------------------------------------------------------------------------------------------
    make            str18   %-18s                 Make and Model
    price           int     %8.0gc                Price
    mpg             int     %8.0g                 Mileage (mpg)
    rep78           int     %8.0g                 Repair Record 1978
    headroom        float   %6.1f                 Headroom (in.)
    trunk           int     %8.0g                 Trunk space (cu. ft.)
    weight          int     %8.0gc                Weight (lbs.)
    length          int     %8.0g                 Length (in.)
    turn            int     %8.0g                 Turn Circle (ft.)
    displacement    int     %8.0g                 Displacement (cu. in.)
    gear_ratio      float   %6.2f                 Gear Ratio
    foreign         byte    %8.0g      origin     Car type
    --------------------------------------------------------------------------------------------------------------
    Sorted by: foreign
    
    . sort make
    
    . egen foo = mean(mpg), by(foreign)
    
    . d, s
    
    Contains data from C:\Program Files (x86)\Stata\ado\base/a/auto.dta
      obs:            74                          1978 Automobile Data
     vars:            13                          13 Apr 2018 17:45
    Sorted by: make
         Note: Dataset has changed since last saved.
    
    . bysort foreign : egen bar = mean(mpg)
    
    . d, s
    
    Contains data from C:\Program Files (x86)\Stata\ado\base/a/auto.dta
      obs:            74                          1978 Automobile Data
     vars:            14                          13 Apr 2018 17:45
    Sorted by: foreign
         Note: Dataset has changed since last saved.
    
    .


    There should be no difference in terms of the values of the variables created.
    Last edited by Nick Cox; 14 Apr 2020, 09:52.

    Comment

    Working...
    X