Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • store trimmed mean as a new variable

    I have by group data and want to calculate the 10% trimmed mean (cut at both ends) and store the results in a new variable. I used the following codes for trimmed mean (they look correct to me), but couldn't find a way to generate a new variable for these results. I appreciate your suggestion!

    input group score
    A 5
    A 3
    A 4
    A 5
    A 1
    A 2
    A 5
    A 3
    A 3
    A 4
    A 2
    A 1
    B 4
    B 2
    B 3
    B 4
    B 4
    B 4
    B 4
    B 3
    B 3
    B 2
    B 4
    B 4
    B 3
    B 3
    C 1
    C 1
    C 1
    C 5
    C 5
    C 5
    C 2
    C 3
    C 4
    C 4


    sort group
    bysort group: trimmean score, p(10)

  • #2
    As recommended in the FAQ, it should be underlined that the command above uses the user-written program - trimmean -, whose author is Nick Cox.

    I assume you have installed it. According to its help files shown here, "by" is not and option, but the "if" clause may perhaps be helpful to you.
    Best regards,

    Marcos

    Comment


    • #3
      Originally posted by Marcos Almeida View Post
      As recommended in the FAQ, it should be underlined that the command above uses the user-written program - trimmean -, whose author is Nick Cox.

      I assume you have installed it. According to its help files shown here, "by" is not and option, but the "if" clause may perhaps be helpful to you.
      Thanks Marcos. Yes, I installed the trimmean program. I was able to get the results using the "by". In the help file I retrieved by typing help trimmean in stata, "by" is allowed. The results were shown in the execution window. I didn't know how to generate a new variable to store it.

      Comment


      • #4


        What Marcos is alluding to is this:

        12.1 What to say about your commands and your problem


        Say exactly what you typed and exactly what Stata typed (or did) in response. N.B. exactly!

        If you are using user-written commands, explain that and say where they came from: the Stata Journal, SSC, or other archives. This helps (often crucially) in explaining your precise problem, and it alerts readers to commands that may be interesting or useful to them.

        Here are some examples:
        I am using xtreg in Stata 13.1.
        I am using estout from SSC in Stata 13.1.
        So, the form of words implied is

        I am using trimmean from the Stata Journal.


        Modifying the original example you could do something like this:


        Code:
        clear
        input str1 group score
        A 5
        A 3
        A 4
        A 5
        A 1
        A 2
        A 5
        A 3
        A 3
        A 4
        A 2
        A 1
        B 4
        B 2
        B 3
        B 4
        B 4
        B 4
        B 4
        B 3
        B 3
        B 2
        B 4
        B 4
        B 3
        B 3
        C 1
        C 1
        C 1
        C 5
        C 5
        C 5
        C 2
        C 3
        C 4
        C 4
        end
        
        sort group
        save original , replace
        statsby tmean=r(tmean10), by(group): trimmean score, p(10)
        merge 1:m group using original
        sort group score
        list, sepby(group)
        Code:
        
             +----------------------------------------+
             | group      tmean   score        _merge |
             |----------------------------------------|
          1. |     A        3.2       1   matched (3) |
          2. |     A        3.2       1   matched (3) |
          3. |     A        3.2       2   matched (3) |
          4. |     A        3.2       2   matched (3) |
          5. |     A        3.2       3   matched (3) |
          6. |     A        3.2       3   matched (3) |
          7. |     A        3.2       3   matched (3) |
          8. |     A        3.2       4   matched (3) |
          9. |     A        3.2       4   matched (3) |
         10. |     A        3.2       5   matched (3) |
         11. |     A        3.2       5   matched (3) |
         12. |     A        3.2       5   matched (3) |
             |----------------------------------------|
         13. |     B   3.416667       2   matched (3) |
         14. |     B   3.416667       2   matched (3) |
         15. |     B   3.416667       3   matched (3) |
         16. |     B   3.416667       3   matched (3) |
         17. |     B   3.416667       3   matched (3) |
         18. |     B   3.416667       3   matched (3) |
         19. |     B   3.416667       3   matched (3) |
         20. |     B   3.416667       4   matched (3) |
         21. |     B   3.416667       4   matched (3) |
         22. |     B   3.416667       4   matched (3) |
         23. |     B   3.416667       4   matched (3) |
         24. |     B   3.416667       4   matched (3) |
         25. |     B   3.416667       4   matched (3) |
         26. |     B   3.416667       4   matched (3) |
             |----------------------------------------|
         27. |     C      3.125       1   matched (3) |
         28. |     C      3.125       1   matched (3) |
         29. |     C      3.125       1   matched (3) |
         30. |     C      3.125       2   matched (3) |
         31. |     C      3.125       3   matched (3) |
         32. |     C      3.125       4   matched (3) |
         33. |     C      3.125       4   matched (3) |
         34. |     C      3.125       5   matched (3) |
         35. |     C      3.125       5   matched (3) |
         36. |     C      3.125       5   matched (3) |
             +----------------------------------------+
        
        
        drop _merge
        Last edited by Nick Cox; 03 Aug 2017, 08:39.

        Comment


        • #5
          Got it. Thank you Nick. Will do so when posting questions in the future.

          Your codes were amazing! Thank you. I wasn't aware of statsby. It's such a useful command. Thanks again.

          Comment


          • #6
            Hi Nick, I have a related question about the trimmean command. What happens when there isn't much data? Say for 10% trim, there are fewer than 10 observations. I rarely do trimmed mean. But in the case I am working on, I am required to explore the data even for groups with few observations.

            Comment


            • #7
              This is documented in the help:

              A more general rule is that the lowest value included in the calculation of the p% trimmed mean is y(r), where r = 1 +
              floor(n * p/100), and the highest value included is thus y(n - r + 1). The ceiling option specifies the use of ceil()
              rather than floor(). See Cox (2003) for more discussion and further references on floor and ceiling functions.
              So if n < 10, then floor(n * 10/100) reduces to 0, r as defined here to 1, and the 10% trimmed mean reduces to the mean. Specifying ceiling would always trim one value in each tail.

              Comment


              • #8
                Ok. Got it. Thanks!

                Comment


                • #9
                  Here is another way to do it without file choreography using rangerun (SSC: Robert Picard and friend):


                  Code:
                  program mytrim 
                      trimmean score, p(10)
                      gen tmean = r(tmean10) 
                  end
                  
                  egen ngroup = group(group) 
                  rangerun mytrim, interval(ngroup 0 0) use(score)  
                  
                  sort group score
                  list, sepby(group)

                  Comment


                  • #10
                    And since the results are the same within each group, you can avoid running the program for each observation by using an invalid interval for repeats.

                    Code:
                    program mytmean
                        trimmean score, p(10)
                        gen tmean2 = r(tmean10)
                    end
                    egen ngroup = group(group)
                    
                    bysort ngroup: gen high = cond(_n==1, ngroup, 0)
                    rangerun mytmean, interval(ngroup 0 high)

                    Comment


                    • #11
                      Many thanks to Nick and Robert. I was able to run the analysis based on your suggestion and provided the results/suggestions to my client.

                      Comment

                      Working...
                      X