store trimmed mean as a new variable

ji zhou

Join Date: Jul 2014

Posts: 46
#1

store trimmed mean as a new variable

02 Aug 2017, 09:33

I have by group data and want to calculate the 10% trimmed mean (cut at both ends) and store the results in a new variable. I used the following codes for trimmed mean (they look correct to me), but couldn't find a way to generate a new variable for these results. I appreciate your suggestion!

input group score
A 5
A 3
A 4
A 5
A 1
A 2
A 5
A 3
A 3
A 4
A 2
A 1
B 4
B 2
B 3
B 4
B 4
B 4
B 4
B 3
B 3
B 2
B 4
B 4
B 3
B 3
C 1
C 1
C 1
C 5
C 5
C 5
C 2
C 3
C 4
C 4

sort group
bysort group: trimmean score, p(10)
Tags: None
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

02 Aug 2017, 09:45

As recommended in the FAQ, it should be underlined that the command above uses the user-written program - trimmean -, whose author is Nick Cox.

I assume you have installed it. According to its help files shown here, "by" is not and option, but the "if" clause may perhaps be helpful to you.

Best regards,

Marcos
Comment
ji zhou

Join Date: Jul 2014

Posts: 46
#3

02 Aug 2017, 09:55

Originally posted by Marcos Almeida View Post

As recommended in the FAQ, it should be underlined that the command above uses the user-written program - trimmean -, whose author is Nick Cox.

I assume you have installed it. According to its help files shown here, "by" is not and option, but the "if" clause may perhaps be helpful to you.

Thanks Marcos. Yes, I installed the trimmean program. I was able to get the results using the "by". In the help file I retrieved by typing help trimmean in stata, "by" is allowed. The results were shown in the execution window. I didn't know how to generate a new variable to store it.
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

03 Aug 2017, 08:36

What Marcos is alluding to is this:

12.1 What to say about your commands and your problem

Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!

If you are using user-written commands, explain that and say where they came from: the Stata Journal, SSC, or other archives. This helps (often crucially) in explaining your precise problem, and it alerts readers to commands that may be interesting or useful to them.

Here are some examples:

I am using xtreg in Stata 13.1.

I am using estout from SSC in Stata 13.1.

So, the form of words implied is

I am using trimmean from the Stata Journal.

Modifying the original example you could do something like this:

Code:

clear
input str1 group score
A 5
A 3
A 4
A 5
A 1
A 2
A 5
A 3
A 3
A 4
A 2
A 1
B 4
B 2
B 3
B 4
B 4
B 4
B 4
B 3
B 3
B 2
B 4
B 4
B 3
B 3
C 1
C 1
C 1
C 5
C 5
C 5
C 2
C 3
C 4
C 4
end

sort group
save original , replace
statsby tmean=r(tmean10), by(group): trimmean score, p(10)
merge 1:m group using original
sort group score
list, sepby(group)

Code:


     +----------------------------------------+
     | group      tmean   score        _merge |
     |----------------------------------------|
  1. |     A        3.2       1   matched (3) |
  2. |     A        3.2       1   matched (3) |
  3. |     A        3.2       2   matched (3) |
  4. |     A        3.2       2   matched (3) |
  5. |     A        3.2       3   matched (3) |
  6. |     A        3.2       3   matched (3) |
  7. |     A        3.2       3   matched (3) |
  8. |     A        3.2       4   matched (3) |
  9. |     A        3.2       4   matched (3) |
 10. |     A        3.2       5   matched (3) |
 11. |     A        3.2       5   matched (3) |
 12. |     A        3.2       5   matched (3) |
     |----------------------------------------|
 13. |     B   3.416667       2   matched (3) |
 14. |     B   3.416667       2   matched (3) |
 15. |     B   3.416667       3   matched (3) |
 16. |     B   3.416667       3   matched (3) |
 17. |     B   3.416667       3   matched (3) |
 18. |     B   3.416667       3   matched (3) |
 19. |     B   3.416667       3   matched (3) |
 20. |     B   3.416667       4   matched (3) |
 21. |     B   3.416667       4   matched (3) |
 22. |     B   3.416667       4   matched (3) |
 23. |     B   3.416667       4   matched (3) |
 24. |     B   3.416667       4   matched (3) |
 25. |     B   3.416667       4   matched (3) |
 26. |     B   3.416667       4   matched (3) |
     |----------------------------------------|
 27. |     C      3.125       1   matched (3) |
 28. |     C      3.125       1   matched (3) |
 29. |     C      3.125       1   matched (3) |
 30. |     C      3.125       2   matched (3) |
 31. |     C      3.125       3   matched (3) |
 32. |     C      3.125       4   matched (3) |
 33. |     C      3.125       4   matched (3) |
 34. |     C      3.125       5   matched (3) |
 35. |     C      3.125       5   matched (3) |
 36. |     C      3.125       5   matched (3) |
     +----------------------------------------+


drop _merge

Last edited by Nick Cox; 03 Aug 2017, 08:39.

Comment

ji zhou

Join Date: Jul 2014

Posts: 46
#5

03 Aug 2017, 09:28

Got it. Thank you Nick. Will do so when posting questions in the future.

Your codes were amazing! Thank you. I wasn't aware of statsby. It's such a useful command. Thanks again.
Comment
ji zhou

Join Date: Jul 2014

Posts: 46
#6

03 Aug 2017, 09:32

Hi Nick, I have a related question about the trimmean command. What happens when there isn't much data? Say for 10% trim, there are fewer than 10 observations. I rarely do trimmed mean. But in the case I am working on, I am required to explore the data even for groups with few observations.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35724
#7

03 Aug 2017, 09:47

This is documented in the help:

A more general rule is that the lowest value included in the calculation of the p% trimmed mean is y(r), where r = 1 +
floor(n * p/100), and the highest value included is thus y(n - r + 1). The ceiling option specifies the use of ceil()
rather than floor(). See Cox (2003) for more discussion and further references on floor and ceiling functions.

So if n < 10, then floor(n * 10/100) reduces to 0, r as defined here to 1, and the 10% trimmed mean reduces to the mean. Specifying ceiling would always trim one value in each tail.
Comment
ji zhou

Join Date: Jul 2014

Posts: 46
#8

03 Aug 2017, 11:01

Ok. Got it. Thanks!
Comment

Nick Cox

Join Date: Mar 2014
Posts: 35724

03 Aug 2017, 11:57

Here is another way to do it without file choreography using rangerun (SSC: Robert Picard and friend):

Code:

program mytrim 
    trimmean score, p(10)
    gen tmean = r(tmean10) 
end

egen ngroup = group(group) 
rangerun mytrim, interval(ngroup 0 0) use(score)  

sort group score
list, sepby(group)

Comment

Robert Picard

Join Date: Mar 2014

Posts: 1536
#10

03 Aug 2017, 12:31

And since the results are the same within each group, you can avoid running the program for each observation by using an invalid interval for repeats.

Code:

program mytmean trimmean score, p(10) gen tmean2 = r(tmean10) end egen ngroup = group(group) bysort ngroup: gen high = cond(_n==1, ngroup, 0) rangerun mytmean, interval(ngroup 0 high)
1 like
Comment
ji zhou

Join Date: Jul 2014

Posts: 46
#11

19 Sep 2017, 11:45

Many thanks to Nick and Robert. I was able to run the analysis based on your suggestion and provided the results/suggestions to my client.
Comment

Announcement