How to find median number of observations per group

Elaine Tantalo

Join Date: Mar 2022

Posts: 7
#1

How to find median number of observations per group

25 Jul 2023, 14:02

Hello,

I would like some advice on how to get a summary stat from a dataset that is a bit complicated. In this dataset, each row is a medical record. There is a column called id where the person's id number is displayed but the same id does appear more than once, meaning the same person may have several medical records. I need to find the median number of records per person. (To be clear I need 1 number representing the average number of records a person in this dataset has.)

I have tried using egen to create groups based on id as well as the collapse command but so far have not been able to figure this out. I would appreciate any help!

Thank you,
Elaine
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

25 Jul 2023, 14:10

Something like this:

Code:

by person_id, sort: gen long num_records = _N summ num_records, detail local "Median number of records per person = " %2.1f =`r(p50)'

Note: Because no example data was provided, this code is untested and may contain typos or other errors. In the future, when asking for help with code, always provide example data. While it is sometimes possible to guess the structure and nature of the data without an example, when those guesses are wrong, those who help you waste their time writing code that cannot possibly work, and you waste yours attempting to run it and then re-posting about that problem. So always show example data when asking for help with code.

The helpful way to show example data is by using the -datatex- command. If you are running version 18, 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#3

25 Jul 2023, 14:35

Consider this toy dataset:

Code:

+----+ | id | |----| 1. | 1 | |----| 2. | 2 | 3. | 2 | |----| 4. | 3 | 5. | 3 | 6. | 3 | 7. | 3 | +----+

The number of identifiers is 3 and the median number of records is 2 (from 1, 2, 4). But @Clyde Schechter's code will give 3, because each number of records is counted that many times. I take it that you want the median across people, not records. If so, then @Clyde's code needs a tweak:

Code:

by person_id, sort: gen long num_records = _N egen tag = tag(person_id) summ num_records if tag, detail
1 like
Comment
Elaine Tantalo

Join Date: Mar 2022

Posts: 7
#4

25 Jul 2023, 14:41

Thanks for your help; this worked.

Apologies for not including sample data I will be sure to do so next time.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#5

25 Jul 2023, 14:52

Yes, Nick is right. Actually, the way I usually handle this is:

Code:

by person_id, sort: gen long num_records = _N if _n == 1 summ num_records, detail local "Median number of records per person = " %2.1f =`r(p50)'

That avoids creating an extra variable that might not otherwise be needed.
1 like
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35734
#6

25 Jul 2023, 15:13

local should be display in #2 and #5.
1 like
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#7

25 Jul 2023, 15:15

Yes, right again, Nick. Not sure what my brain was doing when I wrote those!?!
Comment

Announcement

How to find median number of observations per group

Comment

Comment

Comment

Comment

Comment

Comment