egen MY_VAR = mean(MY_VAR), by(group)

Christopher Bratt

Join Date: May 2019

Posts: 144
#1

egen MY_VAR = mean(MY_VAR), by(group)

23 Jun 2021, 03:41

Is there a nice way in Stata to develop a group mean of a variable and keep the original name? (The original variable is already a cluster-level variable and should preferably retain its name; the idea is to let all individuals in a group get a value on the group variable after appending data.)

I end up with something like...

Code:

rename my_variable my_variable_x egen my_variable = mean(my_variable_x), by(group) drop my_variable_x

... which is okay with one or two variables, but once you do this for 10 variables or so, this is really ugly code and probably also dysfunctional code.

I'm looking for single-line code. This should be doable with Stata's concept of local macros, I guess?

Last edited by Christopher Bratt; 23 Jun 2021, 04:08.
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35780
#2

23 Jun 2021, 04:10

ereplace by Chris Larkin is on SSC. The first draft was, if I recall correctly, written by me in response to a similar query here and the flavour was very much you can do this with a command written for that purpose, but watch out.

I feel your frustration here but just need to flag that overwriting your data with summary statistics is a drastic step, and not reversible unless you read the dataset in again.

I have countless times realised that I need to drop an egen result and try again, but I prefer that pain to realising that I've overwritten my data.
1 like
Comment
Christopher Bratt

Join Date: May 2019

Posts: 144
#3

23 Jun 2021, 04:25

Thanks Nick. Always helpful and I encounter you again and again in tips I find on the internet.

Concerning overwriting data:
I ALWAYS start with the raw data, in every session (several times a day when I am coding). This approach is now quite common in R, I also use it in Stata -- It's part of the paradigm of reproducible research. (I get your point on overwriting existing variable names, but in this case, I decide to do so. The code is there to inspect, be it me or others if the work is published.)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35780
#4

23 Jun 2021, 04:52

Thanks for this. I am clear that you understand the pluses and minuses here.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#5

23 Jun 2021, 04:53

I also feel that not having ereplace is a pain, and I will check the user contributed command.

But the -egen, mean()- is a two liner, so if you want to go advanced you can just do the same without egen.

Code:

egen y = mean(x), by(id)

is equivalent to

Code:

bysort id: gen y = sum(x)/sum(!missing(x)) by id: replace y = y[_N]
Comment
Chris Larkin

Join Date: Apr 2016

Posts: 296
#6

10 Jul 2021, 05:25

Originally posted by Nick Cox View Post

ereplace by Chris Larkin is on SSC. The first draft was, if I recall correctly, written by me in response to a similar query here and the flavour was very much you can do this with a command written for that purpose, but watch out.

I feel your frustration here but just need to flag that overwriting your data with summary statistics is a drastic step, and not reversible unless you read the dataset in again.

I have countless times realised that I need to drop an egen result and try again, but I prefer that pain to realising that I've overwritten my data.

ereplace (SSC) was absolutely written by Nick! I just wrote the help file, submitted to SSC (with Nick's permission), and agreed to maintain it if errors/issues arise
2 likes
Comment

Announcement

egen MY_VAR = mean(MY_VAR), by(group)

Comment

Comment

Comment

Comment

Comment