How to display the 3 most reoccurring observations in a string variable (presented as median)

Sigurd Sloth

Join Date: Jun 2019

Posts: 13
#1

How to display the 3 most reoccurring observations in a string variable (presented as median)

12 Feb 2020, 05:32

Hi.
I have a very long data set where I have a number of observations for each id. In each observation there is a string variable with a procedure text - lets call it "Surgical procedure".
I want to list (display) the 3 most reoccurring "surgical procedures". In other words I want the 3 most frequently performed procedures presented as median.

Example:
id Surgical procedure
1 Procedure A
1 Procedure B
2 Procedure A
2 Procedure C
2 Procedure A
3 Procedure D
3 Procedure B
4 Procedure A
4 Procedure A
4 Procedure A
4 Procedure D
4 Procedure B
4 Procedure C

I hope this is enough info to get som help.

Thanx
Tags: None

Andrew Musau

Join Date: Oct 2014
Posts: 10176

12 Feb 2020, 06:50

I do not understand what you mean by

presented as median

If the string variable only includes the names of the procedures, you can encode and tabulate to see the frequency.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float id str29 Surgical_procedure
1 "Procedure A"
1 "Procedure B"
2 "Procedure A"
2 "Procedure C"
2 "Procedure A"
3 "Procedure D"
3 "Procedure B"
4 "Procedure A"
4 "Procedure A"
4 "Procedure A"
4 "Procedure D"
4 "Procedure B"
4 "Procedure D"
end

encode Surgical_procedure, gen(procedure)
tab procedure

Res.:

Code:

. encode Surgical_procedure, gen(procedure)

. tab procedure

  procedure |      Freq.     Percent        Cum.
------------+-----------------------------------
Procedure A |          6       46.15       46.15
Procedure B |          3       23.08       69.23
Procedure C |          1        7.69       76.92
Procedure D |          3       23.08      100.00
------------+-----------------------------------
      Total |         13      100.00

Edit: In fact, encode is not necessary. You can tabulate string variables as well.

Last edited by Andrew Musau; 12 Feb 2020, 06:57.

Comment

Nick Cox

Join Date: Mar 2014

Posts: 35594
#3

12 Feb 2020, 06:57

Your data example is clear but requires surgery to be readable by Stata. Please do read and act on the request to use dataex.

Otherwise this example can be run easily:

Code:

. clear . set obs 100 number of observations (_N) was 0, now 100 . set seed 2803 . gen test = word("frog toad newt dragon lizard", runiformint(1,5)) . tab test, sort test | Freq. Percent Cum. ------------+----------------------------------- toad | 23 23.00 23.00 dragon | 20 20.00 43.00 frog | 20 20.00 63.00 newt | 19 19.00 82.00 lizard | 18 18.00 100.00 ------------+----------------------------------- Total | 100 100.00 . groups test, order(hi) select(3) +----------------------------------+ | test Freq. Percent %<= | |----------------------------------| | toad 23 23.00 23.00 | | dragon 20 20.00 43.00 | | frog 20 20.00 63.00 | +----------------------------------+

except that you must install groups from the Stata Journal.

SJ-18-1 st0496_1 . . . . . . . . . . . . . . . . . Software update for groups
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q1/18 SJ 18(1):291
groups exited with an error message if weights were specified;
this has been corrected

SJ-17-3 st0496 . . . . . Speaking Stata: Tables as lists: The groups command
(help groups if installed) . . . . . . . . . . . . . . . . N. J. Cox
Q3/17 SJ 17(3):760--773
presents command for listing group frequencies and percents and
cumulations thereof; for various subsetting and ordering by
frequencies, percents, and so on; for reordering of columns;
and for saving tabulated data to new datasets

Software download from st0496_1 is free. The 2017 paper is accessible by subscription (until 2020Q3, when the paywall will be removed).

See also https://www.statalist.org/forums/for...updated-on-ssc which is free.

I can't see that this has anything to do with medians. The values shown could all be called modes.
Comment
Sigurd Sloth

Join Date: Jun 2019

Posts: 13
#4

18 Feb 2020, 00:35

Thanks to both of you! Solved my problem!
Comment

Announcement

How to display the 3 most reoccurring observations in a string variable (presented as median)

Comment

Comment

Comment