Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • picking maximum 3 values from a column.

    Dear All:

    I have a dataset with 3 variables. I want to pick the highest 3 values from one column then want to have average of those 3 values.
    For example:

    gvkey year return
    11 1990 10
    11 1990 11
    11 1990 12
    11 1990 8
    12 1990 6
    12 1990 5
    12 1990 4
    12 1990 8
    13 1990 22
    13 1990 20
    13 1990 21
    11 1991 15
    11 1991 14
    11 1991 13
    11 1991 10
    Now I want to have for gvkey 11 and year 1990 the maximum values such as 10, 11, and 12 and want have the average (10+11+12)/3=11.

    Please let me know which stata command I should use.

    Thank you.

  • #2
    Code:
    assert !missing(return)
    by gvkey year (return), sort: gen desired = (return[_N] + return[_N-1] + return[_N-2])/3
    will do this provided that there are no missing values of return. Note also that the problem itself is undefined if there are not at least 3 observations of return for each gvkey-year combination.

    If the variable return does sometimes have missing values, then it's slightly more complicated:

    Code:
    gen byte miss_ret = missing(return)
    by gvkey year miss_ret (return), sort: gen desired = (return[_N] + return[_N-1] + return[_N-2])/3
    by gvkey year: replace desired = desired[1]
    The logic of this is based on the fact that in Stata, missing values sort last (larger than every number). So in this version of the code we separately sort those observations where return is missing from those where it isn't. We calculated desired in both groups. In the group with missing values, desired comes out missing as well. But more importantly, miss_ret is 0 for the groups without missing values and 1 for the groups with missing values. So within the gvkey year group, the first value of desired is then spread to the entire gvkey year group.

    Again, though, if there are any gvkey-year combinations for which there are not at least 3 non-missing values of return, you will only get missing value as a result.

    Comment


    • #3
      Thank you for your quick reply. The codes works well. The first set of codes have worked perfectly.

      Comment


      • #4
        Here's another method. I don't especially recommend it, but the code is worth a few moments' study.

        Code:
        gsort gvkey year -return 
        bysort gvkey year : gen desired = (return[1] + return[2] + return[3])/3

        Comment

        Working...
        X