Calculating Mean and Percentile of a variable that is repeated several times

James Sonela

Join Date: Feb 2023

Posts: 15
#1

Calculating Mean and Percentile of a variable that is repeated several times

11 Oct 2023, 10:18

Hi,

I have a variable called VoterTurnout2013 which does not change by time within IDs in my panel dataset. I want to calculate mean and percentiles for this variable for the whole dataset. However, I have problem if I do the ordinary calculations since I have an unbalanced panel and number of observations differ for each ID. Thus, IDs with higher (or lower) observations distort the mean and percentiles.

How can I calculate the mean and percentiles in this situation? Another thing I would like to ask: How does STATA takes into account N/A cells while calculating means (are they dropped or regarded as "0")?

Thanks for your help, example of my dataset is depicted below.
City (ID) Time VoterTurnout2013

1 2013 88%

1 2014 88%

1 2015 88%

2 2013 72%

2 2014 72%

3 2013 79%

3 2014 79%

4 2013 91%

Last edited by James Sonela; 11 Oct 2023, 10:22.
Tags: None
James Sonela

Join Date: Feb 2023

Posts: 15
#2

13 Oct 2023, 06:47

Could someone please help me?
Comment
George Ford

Join Date: Aug 2014

Posts: 3182
#3

13 Oct 2023, 07:53

Since it doesn't change by time, you could collapse the data to get one value per city id, then compute the mean.

collapse (first) VoterTurnout2013 , by(cityid)

Or you could mark the first observation for the city id and restrict the mean computation accordingly.

HTML Code:

https://www.stata.com/support/faqs/data-management/first-and-last-occurrences/

Last edited by George Ford; 13 Oct 2023, 07:56.
Comment
James Sonela

Join Date: Feb 2023

Posts: 15
#4

13 Oct 2023, 19:16

Originally posted by George Ford View Post

Since it doesn't change by time, you could collapse the data to get one value per city id, then compute the mean.

collapse (first) VoterTurnout2013 , by(cityid)

Or you could mark the first observation for the city id and restrict the mean computation accordingly.

HTML Code:

https://www.stata.com/support/faqs/data-management/first-and-last-occurrences/

Thank you very much. I would like to append these newly calculated percentiles to my dataset. I am thinking of saving the collapsed dataset as different file and merge them with id variables (I have nine variables and for each of theam I will calculate mean, 50th and 75th percentile; which makes 27 columns to add). If you have a quicker solution, I will appreciate it.
Comment

George Ford

Join Date: Aug 2014
Posts: 3182

14 Oct 2023, 12:22

Code:

bys cid: g first = _n==1
foreach var in x1 x2 x3 x4 {
     qui summ `var' if first , d
     g `var'_mean = r(mean)
     g `var'_50 = r(p50)
     g `var'_75 = r(p75)
}

Comment

James Sonela

Join Date: Feb 2023
Posts: 15

18 Oct 2023, 07:19

Originally posted by George Ford View Post

Code:

bys cid: g first = _n==1
foreach var in x1 x2 x3 x4 {
qui summ `var' if first , d
g `var'_mean = r(mean)
g `var'_50 = r(p50)
g `var'_75 = r(p75)
}

Dear George, many thanks!

City (ID)	Time	VoterTurnout2013
1	2013	88%
1	2014	88%
1	2015	88%
2	2013	72%
2	2014	72%
3	2013	79%
3	2014	79%
4	2013	91%

Announcement

Calculating Mean and Percentile of a variable that is repeated several times

Comment

Comment

Comment

Comment

Comment