How to create a new variable containing the mean from other variables

Al Bothwell

Join Date: Apr 2015

Posts: 149
#1

How to create a new variable containing the mean from other variables

20 Dec 2018, 14:48

I have panel data (extract shown below). For each subject, I would like to compute a single mean of bp_count over their visits. The number of study visits vary by subject.
Then, I would like to create a new variable containing the mean of bp_count for each subject. Suggestions would be appreciated.

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float(id visit bp_count) 7 27 3 7 18 2 7 36 2 7 30 2 7 2 2 7 12 3 7 33 2 7 9 2 7 24 3 7 39 2 7 1 3 7 15 3 7 3 2 7 42 2 7 6 2 7 51 2 7 21 2 7 48 2 7 0 3 7 45 2 10 33 1 10 24 2 10 3 1 10 6 2 10 21 1 10 27 2 10 36 2 10 2 1 10 1 1 10 0 1 end

Happy holidays,
Al Bothwell
Tags: None

David Benson

Join Date: Oct 2018
Posts: 489

20 Dec 2018, 15:41

Hi Al,

Not sure what you mean by "For each subject, I would like to compute a single mean of bp_count over their visits." But creating the mean of bp_count per subject is easy enough.

Also note that visits seem to increment by 3.

Code:

sort id visit
. list id visit bp_count if obs_no <=10, noobs

  +-----------------------+
  | id   visit   bp_count |
  |-----------------------|
  |  7       0          3 |
  |  7       1          3 |
  |  7       2          2 |
  |  7       3          2 |
  |  7       6          2 |
  |-----------------------|
  |  7       9          2 |
  |  7      12          3 |
  |  7      15          3 |
  |  7      18          2 |
  |  7      21          2 |
  |-----------------------|
  | 10       0          1 |
  | 10       1          1 |
  | 10       2          1 |
  | 10       3          1 |
  | 10       6          2 |
  |-----------------------|
  | 10      21          1 |
  | 10      24          2 |
  | 10      27          2 |
  | 10      33          1 |
  | 10      36          2 |
  +-----------------------+

by id: gen obs_no = _n  // creating a counter for each id
egen mean_bp_count = mean(bp_count), by(id)  // creating mean of bp_count by id
tabstat bp_count, by(id) stats(n mean median min max)  // just checking the answer

Summary for variables: bp_count
     by categories of: id

      id |         N      mean       p50       min       max
---------+--------------------------------------------------
       7 |        20       2.3         2         2         3
      10 |        10       1.4         1         1         2
---------+--------------------------------------------------
   Total |        30         2         2         1         3
------------------------------------------------------------


. list if obs_no <=10, noobs abbrev(14)

  +------------------------------------------------+
  | id   visit   bp_count   mean_bp_count   obs_no |
  |------------------------------------------------|
  |  7       0          3             2.3        1 |
  |  7       1          3             2.3        2 |
  |  7       2          2             2.3        3 |
  |  7       3          2             2.3        4 |
  |  7       6          2             2.3        5 |
  |------------------------------------------------|
  |  7       9          2             2.3        6 |
  |  7      12          3             2.3        7 |
  |  7      15          3             2.3        8 |
  |  7      18          2             2.3        9 |
  |  7      21          2             2.3       10 |
  |------------------------------------------------|
  | 10       0          1             1.4        1 |
  | 10       1          1             1.4        2 |
  | 10       2          1             1.4        3 |
  | 10       3          1             1.4        4 |
  | 10       6          2             1.4        5 |
  |------------------------------------------------|
  | 10      21          1             1.4        6 |
  | 10      24          2             1.4        7 |
  | 10      27          2             1.4        8 |
  | 10      33          1             1.4        9 |
  | 10      36          2             1.4       10 |
  +------------------------------------------------+

Comment

Al Bothwell

Join Date: Apr 2015

Posts: 149
#3

21 Dec 2018, 13:25

Thank you David, your solution is exactly what I needed.

Happy Holidays,
Al Bothwell
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#4

21 Dec 2018, 17:53

It seems to me that Davis's code can be simplified as

Code:

bys id (visit): egen mean_bp_count1 = mean(bp_count)

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#5

21 Dec 2018, 18:39

River Huang Your syntax is equivalent to David's. The by() option is no longer documented, but many users learned about it when it was documented and they've passed that knowledge on through postings here.

In fact specifying sorting within id according to visit is unnecessary but harmless. The mean is the same regardless of the sort order of the values.
Comment
River Huang

Join Date: Mar 2016

Posts: 1908
#6

21 Dec 2018, 19:13

Dear Nick, Thanks for the message.

Ho-Chuan (River) Huang
Stata 19.0, MP(4)
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#7

22 Dec 2018, 06:01

Originally posted by Nick Cox View Post

River Huang Your syntax is equivalent to David's. The by() option is no longer documented, but many users learned about it when it was documented and they've passed that knowledge on through postings here.

In fact specifying sorting within id according to visit is unnecessary but harmless. The mean is the same regardless of the sort order of the values.

Nick, thank you for explaining that -egen, by()- is considered no longer documented. This issue struck me once when somebody told me that "I have to sort anyways". Then I read the manual, and I noticed that the difference between -by varname: egen - and -egen, by(varname)- is not explained. I just thought that they (who wrote the manual) did not make the distinction, because they like you think that the two are equivalent.

I think that the two are not equivalent in a pretty major (to me at least) way:

1. -bysort varname: egen- is not sort preserving, it will leave your data obviously in the order determined by the sort.

2. -egen, by(varname)- on the other hand is sort preserving. It restores your data in whichever sort it was before you executed the command.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#8

22 Dec 2018, 07:13

Joel: You are correct. There is a difference in that using the by() option won’t change the sort order of the data.

Last edited by Nick Cox; 22 Dec 2018, 07:15.
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#9

22 Dec 2018, 08:01

Originally posted by Nick Cox View Post

Joel: You are correct. There is a difference in that using the by() option won’t change the sort order of the data.

Joel is a different name, Nick :P. Joro is short for Gueorgui (= Георги).

More importantly: Is there any fascinating history behind the decision of StataCorp to relegate the -egen, by(varname)- syntax to undocumented?

Why do they not want us to use anymore the ancient syntax, and presumably want us all to migrate to the new one? (If they are not documenting the ancient one, presumably they found something which they dont like about it.)

In my personal history of using Stata, I learned the -egen, by(varname)- syntax from my first teacher of panel data econometrics Stepan Jurajda somewhere around year 2001.

In the next few years until 2004, I was reasonably successful in manipulating panel data without ever re-sorting the data. I wrote a pretty decent paper using the British Household Panel Study, again, without ever re-sorting the data and just using -egen, by(varname)- as a tool.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#10

22 Dec 2018, 08:21

Sorry; “Joel” was an iPhone autocorrect I failed to spot.

I don’t think there is any hidden issue beyond StataCorp trying to standardize syntax across commands.
1 like
Comment

Announcement

How to create a new variable containing the mean from other variables

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment