How to calculate weighted mean income

Jaeyong Yoo

Join Date: Mar 2019
Posts: 40

How to calculate weighted mean income

18 Oct 2020, 23:09

I want to calculate a weighted mean income for each state.

Here is a part of the sample. (I randomly chose the numbers)

state	year	returns	income
AL	2017	53	1000
AL	2018	45	1100
AL	2019	63	1200
AK	2018	21	500
AK	2019	13	510
AZ	2016	57	700
AZ	2017	76	710
AZ	2018	79	720
AZ	2019	59	730

I have a formula for a weighed mean income for states: Sum of (each year' returns / total returns) * income

For Alabama, 53 / (53+45+63) *1000 + 45 / (53+45+63) *1100 + 63 / (53+45+63) *1200 = 1106.21

I want to create this below data set:

state	year	returns	income	WMincome
AL	2017	53	1000	1106.21
AL	2018	45	1100	1106.21
AL	2019	63	1200	1106.21
AK	2018	21	500	503.82
AK	2019	13	510	503.82
AZ	2016	57	700	715.17
AZ	2017	76	710	715.17
AZ	2018	79	720	715.17
AZ	2019	59	730	715.17

My final data set should look like this:

state	WMincome
AL	1106.21
AK	503.82
AZ	715.17

Since I have 50 states, I could not do this manually. Please help me solve this problem.

Thanks!

Last edited by Jaeyong Yoo; 18 Oct 2020, 23:50.

Tags: None

Andrew Musau

Join Date: Oct 2014

Posts: 10219
#2

19 Oct 2020, 00:06

Several ways

1. CREATE WEIGHT VARIABLE

Code:

bys state: egen weight= total(returns) replace weight= return/weight bys state: egen wanted= total(income*weight)

2. EXPAND OBSERVATIONS

Code:

expand returns, g(new) bys state: egen wanted= mean(income) drop if new

ADDED IN EDIT:

My final data set should look like this:
state WMincome
AL 1106.21
AK 503.82
AZ 715.17

I missed this. collapse allows frequency weights.

Code:

collapse income [fw=returns], by(state)

Last edited by Andrew Musau; 19 Oct 2020, 00:13.
Comment
Jaeyong Yoo

Join Date: Mar 2019

Posts: 40
#3

19 Oct 2020, 00:57

Thanks, Andrew. Much appreciated! Could you explain why you used the "expand returns, g(new)" code?
Comment
Joro Kolev

Join Date: Aug 2018

Posts: 3050
#4

19 Oct 2020, 01:28

Andrew is using the expand, because using frequency weights is equivalent to having a dataset where each observation has been replaces by as many replicas of itself as its frequency weight indicates.

Look how those two produce the same outcome, and notice that the first one is weighted, the second one is expanded but not weighted:

Code:

. preserve . collapse (mean) income [fw = returns], by(state) . list, sep(0) +-----------------+ | state income | |-----------------| 1. | AK 503.824 | 2. | AL 1106.21 | 3. | AZ 715.166 | +-----------------+ . restore . expand returns (457 observations created) . collapse income, by(state) . list, sep(0) +-----------------+ | state income | |-----------------| 1. | AK 503.824 | 2. | AL 1106.21 | 3. | AZ 715.166 | +-----------------+ .

Originally posted by Jaeyong Yoo View Post

Thanks, Andrew. Much appreciated! Could you explain why you used the "expand returns, g(new)" code?
Comment

Joro Kolev

Join Date: Aug 2018
Posts: 3050

19 Oct 2020, 01:35

If you want a final dataset with one observation per state, the above is the way to go.

However if you want to preserve your original data structure and just add the mean, you can also consider the very useful -egen, wtmean- function by David Kantor.

First

Code:

. findit _gwtmean

and follow instructions to install. Then

Code:

. egen WMincome = wtmean(income), weight(returns) by(state)

. list, sep(0)

     +--------------------------------------------+
     | state   year   returns   income   WMincome |
     |--------------------------------------------|
  1. |    AL   2017        53     1000   1106.211 |
  2. |    AL   2018        45     1100   1106.211 |
  3. |    AL   2019        63     1200   1106.211 |
  4. |    AK   2018        21      500   503.8235 |
  5. |    AK   2019        13      510   503.8235 |
  6. |    AZ   2016        57      700   715.1661 |
  7. |    AZ   2017        76      710   715.1661 |
  8. |    AZ   2018        79      720   715.1661 |
  9. |    AZ   2019        59      730   715.1661 |
     +--------------------------------------------+

Comment

Jaeyong Yoo

Join Date: Mar 2019

Posts: 40
#6

19 Oct 2020, 08:56

Originally posted by Joro Kolev View Post

Andrew is using the expand, because using frequency weights is equivalent to having a dataset where each observation has been replaces by as many replicas of itself as its frequency weight indicates.

Look how those two produce the same outcome, and notice that the first one is weighted, the second one is expanded but not weighted:

Code:

. preserve . collapse (mean) income [fw = returns], by(state) . list, sep(0) +-----------------+ | state income | |-----------------| 1. | AK 503.824 | 2. | AL 1106.21 | 3. | AZ 715.166 | +-----------------+ . restore . expand returns (457 observations created) . collapse income, by(state) . list, sep(0) +-----------------+ | state income | |-----------------| 1. | AK 503.824 | 2. | AL 1106.21 | 3. | AZ 715.166 | +-----------------+ .

Thanks for your help Joro!!
Comment

Announcement

How to calculate weighted mean income

Comment

Comment

Comment

Comment

Comment