Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to calculate weighted mean income

    I want to calculate a weighted mean income for each state.

    Here is a part of the sample. (I randomly chose the numbers)
    state year returns income
    AL 2017 53 1000
    AL 2018 45 1100
    AL 2019 63 1200
    AK 2018 21 500
    AK 2019 13 510
    AZ 2016 57 700
    AZ 2017 76 710
    AZ 2018 79 720
    AZ 2019 59 730
    I have a formula for a weighed mean income for states: Sum of (each year' returns / total returns) * income

    For Alabama, 53 / (53+45+63) *1000 + 45 / (53+45+63) *1100 + 63 / (53+45+63) *1200 = 1106.21

    I want to create this below data set:
    state year returns income WMincome
    AL 2017 53 1000 1106.21
    AL 2018 45 1100 1106.21
    AL 2019 63 1200 1106.21
    AK 2018 21 500 503.82
    AK 2019 13 510 503.82
    AZ 2016 57 700 715.17
    AZ 2017 76 710 715.17
    AZ 2018 79 720 715.17
    AZ 2019 59 730 715.17
    My final data set should look like this:
    state WMincome
    AL 1106.21
    AK 503.82
    AZ 715.17
    Since I have 50 states, I could not do this manually. Please help me solve this problem.

    Thanks!
    Last edited by Jaeyong Yoo; 18 Oct 2020, 23:50.

  • #2
    Several ways

    1. CREATE WEIGHT VARIABLE

    Code:
    bys state: egen weight= total(returns)
    replace weight= return/weight
    bys state: egen wanted= total(income*weight)

    2. EXPAND OBSERVATIONS

    Code:
    expand returns, g(new)
    bys state: egen wanted= mean(income)
    drop if new
    ADDED IN EDIT:

    My final data set should look like this:
    state WMincome
    AL 1106.21
    AK 503.82
    AZ 715.17
    I missed this. collapse allows frequency weights.

    Code:
    collapse income [fw=returns], by(state)
    Last edited by Andrew Musau; 19 Oct 2020, 00:13.

    Comment


    • #3
      Thanks, Andrew. Much appreciated! Could you explain why you used the "expand returns, g(new)" code?

      Comment


      • #4
        Andrew is using the expand, because using frequency weights is equivalent to having a dataset where each observation has been replaces by as many replicas of itself as its frequency weight indicates.

        Look how those two produce the same outcome, and notice that the first one is weighted, the second one is expanded but not weighted:

        Code:
        . preserve
        
        . collapse (mean) income [fw = returns], by(state)
        
        . list, sep(0)
        
             +-----------------+
             | state    income |
             |-----------------|
          1. |    AK   503.824 |
          2. |    AL   1106.21 |
          3. |    AZ   715.166 |
             +-----------------+
        
        . restore
        
        . expand returns
        (457 observations created)
        
        . collapse income, by(state)
        
        . list, sep(0)
        
             +-----------------+
             | state    income |
             |-----------------|
          1. |    AK   503.824 |
          2. |    AL   1106.21 |
          3. |    AZ   715.166 |
             +-----------------+
        
        .

        Originally posted by Jaeyong Yoo View Post
        Thanks, Andrew. Much appreciated! Could you explain why you used the "expand returns, g(new)" code?

        Comment


        • #5
          If you want a final dataset with one observation per state, the above is the way to go.

          However if you want to preserve your original data structure and just add the mean, you can also consider the very useful -egen, wtmean- function by David Kantor.

          First
          Code:
          . findit _gwtmean
          and follow instructions to install. Then

          Code:
          . egen WMincome = wtmean(income), weight(returns) by(state)
          
          . list, sep(0)
          
               +--------------------------------------------+
               | state   year   returns   income   WMincome |
               |--------------------------------------------|
            1. |    AL   2017        53     1000   1106.211 |
            2. |    AL   2018        45     1100   1106.211 |
            3. |    AL   2019        63     1200   1106.211 |
            4. |    AK   2018        21      500   503.8235 |
            5. |    AK   2019        13      510   503.8235 |
            6. |    AZ   2016        57      700   715.1661 |
            7. |    AZ   2017        76      710   715.1661 |
            8. |    AZ   2018        79      720   715.1661 |
            9. |    AZ   2019        59      730   715.1661 |
               +--------------------------------------------+

          Comment


          • #6
            Originally posted by Joro Kolev View Post
            Andrew is using the expand, because using frequency weights is equivalent to having a dataset where each observation has been replaces by as many replicas of itself as its frequency weight indicates.

            Look how those two produce the same outcome, and notice that the first one is weighted, the second one is expanded but not weighted:

            Code:
            . preserve
            
            . collapse (mean) income [fw = returns], by(state)
            
            . list, sep(0)
            
            +-----------------+
            | state income |
            |-----------------|
            1. | AK 503.824 |
            2. | AL 1106.21 |
            3. | AZ 715.166 |
            +-----------------+
            
            . restore
            
            . expand returns
            (457 observations created)
            
            . collapse income, by(state)
            
            . list, sep(0)
            
            +-----------------+
            | state income |
            |-----------------|
            1. | AK 503.824 |
            2. | AL 1106.21 |
            3. | AZ 715.166 |
            +-----------------+
            
            .


            Thanks for your help Joro!!

            Comment

            Working...
            X