Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • egen MY_VAR = mean(MY_VAR), by(group)

    Is there a nice way in Stata to develop a group mean of a variable and keep the original name? (The original variable is already a cluster-level variable and should preferably retain its name; the idea is to let all individuals in a group get a value on the group variable after appending data.)

    I end up with something like...

    Code:
    rename my_variable my_variable_x
    egen   my_variable = mean(my_variable_x), by(group)
    drop   my_variable_x
    ... which is okay with one or two variables, but once you do this for 10 variables or so, this is really ugly code and probably also dysfunctional code.

    I'm looking for single-line code. This should be doable with Stata's concept of local macros, I guess?
    Last edited by Christopher Bratt; 23 Jun 2021, 04:08.

  • #2
    ereplace by Chris Larkin is on SSC. The first draft was, if I recall correctly, written by me in response to a similar query here and the flavour was very much you can do this with a command written for that purpose, but watch out.

    I feel your frustration here but just need to flag that overwriting your data with summary statistics is a drastic step, and not reversible unless you read the dataset in again.

    I have countless times realised that I need to drop an egen result and try again, but I prefer that pain to realising that I've overwritten my data.

    Comment


    • #3
      Thanks Nick. Always helpful and I encounter you again and again in tips I find on the internet.

      Concerning overwriting data:
      I ALWAYS start with the raw data, in every session (several times a day when I am coding). This approach is now quite common in R, I also use it in Stata -- It's part of the paradigm of reproducible research. (I get your point on overwriting existing variable names, but in this case, I decide to do so. The code is there to inspect, be it me or others if the work is published.)

      Comment


      • #4
        Thanks for this. I am clear that you understand the pluses and minuses here.

        Comment


        • #5
          I also feel that not having ereplace is a pain, and I will check the user contributed command.

          But the -egen, mean()- is a two liner, so if you want to go advanced you can just do the same without egen.

          Code:
          egen y = mean(x), by(id)
          is equivalent to

          Code:
          bysort id: gen y = sum(x)/sum(!missing(x))
          by id: replace y = y[_N]

          Comment


          • #6
            Originally posted by Nick Cox View Post
            ereplace by Chris Larkin is on SSC. The first draft was, if I recall correctly, written by me in response to a similar query here and the flavour was very much you can do this with a command written for that purpose, but watch out.

            I feel your frustration here but just need to flag that overwriting your data with summary statistics is a drastic step, and not reversible unless you read the dataset in again.

            I have countless times realised that I need to drop an egen result and try again, but I prefer that pain to realising that I've overwritten my data.
            ereplace (SSC) was absolutely written by Nick! I just wrote the help file, submitted to SSC (with Nick's permission), and agreed to maintain it if errors/issues arise

            Comment

            Working...
            X