how to calculate the cumulative mean by groups?

Fred Lee

Join Date: Nov 2017

Posts: 473
#1

how to calculate the cumulative mean by groups?

25 Oct 2021, 03:35

For example, for the observation group 1 time 1, the cumulative mean is missing; for observation group 1 time 2, the cumulative mean is the average of previous observations, namely 74; for observation group 1 time 3, the cumulative mean is the average of previous observations, namely avarage of 74 and 85.5.

Thanks a ton in advance!

Code:

* Example generated by -dataex-. For more info, type help dataex clear input float group byte time double x 1 1 74 1 2 85.5 1 3 83.3 1 4 83.7 1 5 53 1 6 81 1 7 72 1 8 89.9 1 9 85.3 1 10 87.5 1 12 82.8 1 13 79.2 1 15 80.8 1 16 85.2 2 1 62 2 2 73 2 3 63 2 4 63 2 5 78 2 6 68 end
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10214

25 Oct 2021, 06:11

What you want is a mean defined over a range of observations. See rangestat from SSC. I am assuming nonconsecutive time periods imply missing values.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float group byte time double x
1  1   74
1  2 85.5
1  3 83.3
1  4 83.7
1  5   53
1  6   81
1  7   72
1  8 89.9
1  9 85.3
1 10 87.5
1 12 82.8
1 13 79.2
1 15 80.8
1 16 85.2
2  1   62
2  2   73
2  3   63
2  4   63
2  5   78
2  6   68
end

qui sum time
rangestat (mean) x, interval(time `=-`r(max)'' -1) by(group)

Res.:

Code:

. l, sepby(gr)

     +---------------------------------+
     | group   time      x      x_mean |
     |---------------------------------|
  1. |     1      1     74           . |
  2. |     1      2   85.5          74 |
  3. |     1      3   83.3       79.75 |
  4. |     1      4   83.7   80.933333 |
  5. |     1      5     53      81.625 |
  6. |     1      6     81        75.9 |
  7. |     1      7     72       76.75 |
  8. |     1      8   89.9   76.071429 |
  9. |     1      9   85.3        77.8 |
 10. |     1     10   87.5   78.633333 |
 11. |     1     12   82.8       79.52 |
 12. |     1     13   79.2   79.818182 |
 13. |     1     15   80.8   79.766667 |
 14. |     1     16   85.2   79.846154 |
     |---------------------------------|
 15. |     2      1     62           . |
 16. |     2      2     73          62 |
 17. |     2      3     63        67.5 |
 18. |     2      4     63          66 |
 19. |     2      5     78       65.25 |
 20. |     2      6     68        67.8 |
     +---------------------------------+

.

Comment

Fred Lee

Join Date: Nov 2017
Posts: 473

25 Oct 2021, 06:20

Originally posted by Andrew Musau View Post

What you want is a mean defined over a range of observations. See rangestat from SSC. I am assuming nonconsecutive time periods imply missing values.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input float group byte time double x
1 1 74
1 2 85.5
1 3 83.3
1 4 83.7
1 5 53
1 6 81
1 7 72
1 8 89.9
1 9 85.3
1 10 87.5
1 12 82.8
1 13 79.2
1 15 80.8
1 16 85.2
2 1 62
2 2 73
2 3 63
2 4 63
2 5 78
2 6 68
end

qui sum time
rangestat (mean) x, interval(time `=-`r(max)'' -1) by(group)

Res.:

Code:

. l, sepby(gr)

+---------------------------------+
| group time x x_mean |
|---------------------------------|
1. | 1 1 74 . |
2. | 1 2 85.5 74 |
3. | 1 3 83.3 79.75 |
4. | 1 4 83.7 80.933333 |
5. | 1 5 53 81.625 |
6. | 1 6 81 75.9 |
7. | 1 7 72 76.75 |
8. | 1 8 89.9 76.071429 |
9. | 1 9 85.3 77.8 |
10. | 1 10 87.5 78.633333 |
11. | 1 12 82.8 79.52 |
12. | 1 13 79.2 79.818182 |
13. | 1 15 80.8 79.766667 |
14. | 1 16 85.2 79.846154 |
|---------------------------------|
15. | 2 1 62 . |
16. | 2 2 73 62 |
17. | 2 3 63 67.5 |
18. | 2 4 63 66 |
19. | 2 5 78 65.25 |
20. | 2 6 68 67.8 |
+---------------------------------+

.

Thank you! Could you please explain more about "interval(time `=-`r(max)'' -1)"? I know it's the range of observations, so what do you mean by setting "-`r(max)" and "-1"?

Comment

Andrew Musau

Join Date: Oct 2014

Posts: 10214
#4

25 Oct 2021, 06:26

From

Code:

help rangestat

interval(keyvar low high) is required and defines the interval that selects the set of observations to use to calculate result for the current observation. keyvar
is a numeric variable. Observations whose values for keyvar fall within the closed interval bounds are selected. low and high can each be specified using a
numeric variable, a # (a number in Stata parlance), or a system missing value. If a # is used, the bound for each observation is computed by adding # to
keyvar. If low is specified using a system missing value, low is set to missing for all observations. rangestat applies the same rules as inrange() for
missing bounds: if the lower bound is missing, observations will match up to and including the value of high. If both low and high are missing, all
observations will match. Note that the treatment of missing values for low and high differs in version 1.1 up from the previous version of rangestat and this
may require that previous code be adapted. (Use which to find out which version you are running if you do not know.)

-r(max)- after summarize is the maximum value of the summarized statistic. So the key var is time, low bound is (-max time), high bound is (-1), i.e., the previous observation (time-1) if sorting by time and no holes in the panel.
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#5

25 Oct 2021, 06:58

Originally posted by Andrew Musau View Post

From

Code:

help rangestat

-r(max)- after summarize is the maximum value of the summarized statistic. So the key var is time, low bound is (-max time), high bound is (-1), i.e., the previous observation (time-1) if sorting by time and no holes in the panel.

Thanks again! Finally I understand this explanation, the interval defines the location of key variables compared to the current observation, right? For example, interval (time . .) indicates all observations? interval (time . -1) indicates the observations from the first one to the previous one? interval (time 0 0) indicates the current observations? Are these understandings right?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35721
#6

25 Oct 2021, 07:24

That’s correct. In this context system missing means as large as possible, whether it’s a subtraction (e.g. looking back in time) or an addition (e.g. looking forward).
Comment
Fred Lee

Join Date: Nov 2017

Posts: 473
#7

25 Oct 2021, 07:27

Originally posted by Nick Cox View Post

That’s correct. In this context system missing means as large as possible, whether it’s a subtraction (e.g. looking back in time) or an addition (e.g. looking forward).

Thanks Nick!
Comment

Announcement