Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to display the 3 most reoccurring observations in a string variable (presented as median)

    Hi.
    I have a very long data set where I have a number of observations for each id. In each observation there is a string variable with a procedure text - lets call it "Surgical procedure".
    I want to list (display) the 3 most reoccurring "surgical procedures". In other words I want the 3 most frequently performed procedures presented as median.

    Example:
    id Surgical procedure
    1 Procedure A
    1 Procedure B
    2 Procedure A
    2 Procedure C
    2 Procedure A
    3 Procedure D
    3 Procedure B
    4 Procedure A
    4 Procedure A
    4 Procedure A
    4 Procedure D
    4 Procedure B
    4 Procedure C

    I hope this is enough info to get som help.

    Thanx

  • #2
    I do not understand what you mean by

    presented as median
    If the string variable only includes the names of the procedures, you can encode and tabulate to see the frequency.

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input float id str29 Surgical_procedure
    1 "Procedure A"
    1 "Procedure B"
    2 "Procedure A"
    2 "Procedure C"
    2 "Procedure A"
    3 "Procedure D"
    3 "Procedure B"
    4 "Procedure A"
    4 "Procedure A"
    4 "Procedure A"
    4 "Procedure D"
    4 "Procedure B"
    4 "Procedure D"
    end
    
    encode Surgical_procedure, gen(procedure)
    tab procedure
    Res.:

    Code:
    . encode Surgical_procedure, gen(procedure)
    
    . tab procedure
    
      procedure |      Freq.     Percent        Cum.
    ------------+-----------------------------------
    Procedure A |          6       46.15       46.15
    Procedure B |          3       23.08       69.23
    Procedure C |          1        7.69       76.92
    Procedure D |          3       23.08      100.00
    ------------+-----------------------------------
          Total |         13      100.00
    Edit: In fact, encode is not necessary. You can tabulate string variables as well.
    Last edited by Andrew Musau; 12 Feb 2020, 06:57.

    Comment


    • #3
      Your data example is clear but requires surgery to be readable by Stata. Please do read and act on the request to use dataex.


      Otherwise this example can be run easily:
      Code:
      . clear
      
      . set obs 100
      number of observations (_N) was 0, now 100
      
      . set seed 2803
      
      . gen test = word("frog toad newt dragon lizard", runiformint(1,5))
      
      . tab test, sort
      
             test |      Freq.     Percent        Cum.
      ------------+-----------------------------------
             toad |         23       23.00       23.00
           dragon |         20       20.00       43.00
             frog |         20       20.00       63.00
             newt |         19       19.00       82.00
           lizard |         18       18.00      100.00
      ------------+-----------------------------------
            Total |        100      100.00
      
      . groups test, order(hi) select(3)
      
        +----------------------------------+
        |   test   Freq.   Percent     %<= |
        |----------------------------------|
        |   toad      23     23.00   23.00 |
        | dragon      20     20.00   43.00 |
        |   frog      20     20.00   63.00 |
        +----------------------------------+
      except that you must install groups from the Stata Journal.


      SJ-18-1 st0496_1 . . . . . . . . . . . . . . . . . Software update for groups
      (help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
      Q1/18 SJ 18(1):291
      groups exited with an error message if weights were specified;
      this has been corrected

      SJ-17-3 st0496 . . . . . Speaking Stata: Tables as lists: The groups command
      (help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
      Q3/17 SJ 17(3):760--773
      presents command for listing group frequencies and percents and
      cumulations thereof; for various subsetting and ordering by
      frequencies, percents, and so on; for reordering of columns;
      and for saving tabulated data to new datasets

      Software download from st0496_1 is free. The 2017 paper is accessible by subscription (until 2020Q3, when the paywall will be removed).

      See also https://www.statalist.org/forums/for...updated-on-ssc which is free.


      I can't see that this has anything to do with medians. The values shown could all be called modes.

      Comment


      • #4
        Thanks to both of you! Solved my problem!

        Comment

        Working...
        X