Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Finding the observation number of the smallest/largest value in a data set.

    Suppose I have data on a variable x, and I want to find the observation number associated with the minimum and maximum value in the data. I know I can get the min and max easily, and then I can write a loop. Checking whether there's an easier solution.


  • #2
    The trick here is to use an intermediate variable to identify those which are equal to the min/max. However, you'll have to be mindful of what you want to do if you have multiple observations at the min/max value. The code below just lists out all observation numbers, so you could use that potentially.

    Code:
    sysuse auto
    keep mpg
    summ mpg, meanonly
    
    gen byte which_min = cond(mpg == r(min), _n, .)
    gen byte which_max = cond(mpg == r(max), _n, .)
    
    * min and max summary statistics now indicate first and last obs number for the min and max values
    summ which_min
    summ which_max

    Comment


    • #3
      Well, holding observation number in a byte will undoubtedly bite (*) you hard with a large dataset.

      In general,

      Code:
      gen long obsno = _n 
      
      summarize foo, meanonly 
      
      levelsof obsno if foo == r(min)
      and so forth.


      (*) Worst joke on Statalist since 1994.

      Comment


      • #4
        Originally posted by Nick Cox View Post
        Well, holding observation number in a byte will undoubtedly bite (*) you hard with a large dataset.

        (*) Worst joke on Statalist since 1994.
        True on both counts.
        Indeed long (or c(obs_t)) is the more appropriate type.

        Comment


        • #5
          This problem was also discussed in the Stata Journal in 2006

          https://journals.sagepub.com/doi/pdf...867X0600600313

          https://journals.sagepub.com/doi/pdf...867X0600600414

          Comment

          Working...
          X