Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • generating different mean values for sub-groups within the same variable

    hello,

    i am trying to generate different mean & sd values (of retuns) for two different stocks in my data set. I have come up with the following code but it's not working. Any help regarding how to fix it will be appreciated!

    Code:
    bysort stock : sum return
    bysort stock : gen return_mean= r(mean)
    bysort stock : gen return_sdev = r(sd)


  • #2
    j

    Comment


    • #3
      found the solution :
      egen return_mean = mean(return), by(instrument_name)

      Comment


      • #4
        For newer Stata users reading this thread, here's a quick explanation of why Leonie's code in #1 does not work -- specifically in terms of how -by- works, compared to how loops would work.

        When you run a summarize with a by, like
        Code:
        bysort stock : sum return
        I believe only the last summarize run has its values stored in r(). Because r(mean) is a scalar, not a variable, there can be only one value stored in r(mean).

        So, when you run
        Code:
        bysort stock : sum return
        bysort stock : gen return_mean = r(mean)
        the last mean calculated by the summarize will be used for the "gen return_mean" line, and all values of return_mean will be the same. In fact, the "bysort stock" for the generate line does not even do anything useful because nothing within that generate code differs by stock or will be grouped by stock.

        What the above code does is, "find the means for each of the groups of stocks. Then, put the mean for the last group of stocks into return_mean for all observations."

        But, what I believe Leonie means by the above code is, "find the mean of the first group of stocks, and put it in the variable return_mean for the observations that correspond to the first group of stocks; then find the mean of the second group of stocks, and put it in the variable return_mean for the observations that correspond to the first group of stocks; etc."

        To get what Leonie wants to do, you wouldn't use the -bysort- declaration. Instead, you would have to use a loop. For example (untested):
        Code:
        gen return_mean = .
        levelsof stock
        foreach stock in `r(levels)' {
        sum return if stock=="`stock'" replace return_mean = r(mean) if stock=="`stock'"
        }
        However, Leonie's solution with -egen- is an even quicker method to solve this issue, and forgoes the need for for loops.

        Comment

        Working...
        X