Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to count the number of observations at the min and max value?

    Given the data set, Find the average education level in the sample. What are the lowest and highest years of education?

    Code:
    summarize educ
    Thus, I can find the min and max value of variable "educ". How to count the number of observations at the min and max value?

    I know one method is: (0 is the min value)

    Code:
    list educ if educ==0
    Then I can count the numbers.

    Do I have better methods that Stata can give me the number automatically?

  • #2
    Okay, I know I can use count.

    Comment


    • #3
      I think command -count- can help.
      Code:
      sysuse auto,clear
      sum rep78
      count if rep78==`r(min)'
      2B or not 2B, that's a question!

      Comment


      • #4
        Originally posted by Liu Qiang View Post
        I think command -count- can help.
        Code:
        sysuse auto,clear
        sum rep78
        count if rep78==`r(min)'
        Thanks, however,

        Code:
        count if rep78==`r(max)'
        invalid syntax

        Comment


        • #5
          This works for me:

          Code:
          sysuse auto, clear
          summarize rep78
          count if rep78 == r(min)
          count if rep78 == r(max)
          No need to use the local macro persona here. Just use r-class results directly.

          Comment


          • #6
            Originally posted by Yao Zhao View Post

            Thanks, however,

            Code:
            count if rep78==`r(max)'
            invalid syntax
            See:
            Code:
            sysuse auto,clear
            sum rep78
            count if rep78==`r(min)'
            sum rep78
            count if rep78==`r(max)'
            
            sysuse auto,clear
            sum rep78
            local min=`r(min)'
            local max=`r(max)'
            count if rep78==`min'
            count if rep78==`max'
            I am wrong with the previous comment. My guess is that a `r(max)' or `r(min)' should follow sum command. After running -count-, the macros have been cleared .Anyway as always, Nick gives better advice from which I learn a lot. Please follow his advice. And I am really grateful to him for pointing out my mistakes every time.
            Last edited by Liu Qiang; 06 Jun 2019, 00:53.
            2B or not 2B, that's a question!

            Comment


            • #7
              Local macros can be used only once? Really not so.

              But why the syntax in #4 doesn't work is a good question, and I don't have an answer.

              Comment


              • #8
                The error in #4 happens since all the stored achievement (including macros), which were established with summary command, have been cleared out by the effect of count command.
                Code:
                sysuse auto,clear
                sum rep78
                
                di `r(min)'
                di `r(max)'
                di r(min)
                di r(max)
                
                count
                
                di `r(min)'
                di `r(max)'
                di r(min)
                di r(max)

                Comment


                • #9
                  Sorry, #5 is wrong: Romalpa Akzo in #8 is right. I can say that her version was what I first thought of, but then I thought of using r(max) directly and it seemed to work. However, the answer was wrong, so I should have looked more carefully.


                  Code:
                  . sysuse auto, clear
                  (1978 Automobile Data)
                  
                  . summarize rep78
                  
                      Variable |        Obs        Mean    Std. Dev.       Min        Max
                  -------------+---------------------------------------------------------
                         rep78 |         69    3.405797    .9899323          1          5
                  
                  . ret li
                  
                  scalars:
                                    r(N) =  69
                                r(sum_w) =  69
                                 r(mean) =  3.405797101449275
                                  r(Var) =  .9799658994032396
                                   r(sd) =  .9899322701090412
                                  r(min) =  1
                                  r(max) =  5
                                  r(sum) =  235
                  
                  . count if rep78 == r(min)
                    2
                  
                  . ret li
                  
                  scalars:
                                    r(N) =  2
                  
                  . count if rep78 == r(max)
                    5
                  
                  . count if rep78 == `r(max)'
                  invalid syntax
                  r(198);
                  
                  . count if rep78 == 5
                    11
                  count doesn't complain at the direct use of r(max) after a previous count, but the answer it produces is wrong. So, what is the r(max) it is using? That seems to be something hidden in the code for count, or does anyone else have a better story?
                  Last edited by Nick Cox; 06 Jun 2019, 03:40.

                  Comment


                  • #10
                    I guess after being cleared out, r(max) has the value of (numeric) missing, while `r(max)' is just a (string) blank.
                    Code:
                    sysuse auto,clear
                    sum rep78
                    
                    . count if rep78 == r(min)
                      2
                    
                    . di r(max)
                    
                    . assert r(max) ==.
                    
                    . assert r(max) !=.
                    assertion is false
                    r(9);
                    Last edited by Romalpa Akzo; 06 Jun 2019, 04:18.

                    Comment


                    • #11
                      Yes, that's it. Excellent diagnosis,

                      It is a two-step:

                      r(max) is not defined after the count, so evaluated as numeric missing.

                      So, in context.

                      Code:
                      count if rep78 == r(max) 
                      is equivalent to

                      Code:
                      count if rep78 == . 
                      and that answer to that is indeed 5.

                      But `r(max)' is an empty string if r(max) is numerically missing, because Stata's defaults for missing are . if numeric and "" if string.

                      Comment


                      • #12
                        Thank you for your beneficial discussions in #10 and #11. I even did not realize the difference between r-class results and local macros
                        2B or not 2B, that's a question!

                        Comment


                        • #13
                          What's the meaning of
                          ret li

                          Comment


                          • #14
                            ret li means return list

                            help ret

                            Code:
                             Return results for general commands, stored in r()
                            
                            (...)
                            
                            Results of calculations are stored by many Stata commands so that they can be easily accessed and substituted into subsequent commands.

                            Comment

                            Working...
                            X