Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to collapse weighted sum by two groups, but normalize weight by only one of the group

    Hi all,

    I'm looking for some help on collapsing weighted sum. I'm currently working with a US income data set. It contains variables: year, income, and weight, and I'm trying to calculate the income share of top 10 percentile.

    What I did was:
    1) For each year, using xtile function to divide observations into 10 groups (10% percentile, 10%-20% percentile, ... etc. up to the top 10% percentile). Let's call this new variable percentile. (percentile = 1, 2, .., 10)
    Now I have a grand data set containing all the observations and variables year, income, weight, percentile.
    2) Collapse sum by year and percentile. My code is: collapse (sum) income [aweight = weight], by(year percentile) (Weight is float so I used aweight in my code.)
    3) Following step 2, sum up income by year to get total income for each year. For each percentile, income share = income/total income

    The problem with this code is that, when calculating aweighted-sum, Stata needs to normalize aweight. Ideally, I would like to have weights normalized by year, then I would like to sum up all the incomes in each percentile group after observations have been weighted correctly. But since I'm collapsing by both year and percentile, Stata normalizes weights also by both year and percentile.

    Is there a way to get around this? Thank you in advance!
    Last edited by Sherry Lin; 25 Sep 2017, 20:49.

  • #2
    So you can first normalize the weights over each year yourself and then calculate the weighted sums that way, using -collapse- with iweights.

    It would have been better had you provided an example of your data to work on this code with. I have generated a toy data set that matches my interpretation of what you described in your post. But if your data are not organized as mine are then we will have both wasted our time. In the future, please provide example data sets when you want help with code. In providing future example data sets, be sure to use the -dataex- command. Run -ssc install dataex- and then run -help dataex- to read the simple instructions for using it. By using -dataex- you make it possible for those who want to help you to create a complete and faithful replica of your Stata example with a simple copy/paste operation. Use -dataex- whenever you post example data.

    Code:
    clear*
    
    //    CREATE TOY DATA SET
    set obs 5
    gen year = 2000 + _n
    expand 100
    set seed 1234
    gen income = rgamma(2, 25000)
    gen weight = runiform()
    
    //    CALCULATE DECILES OF INCOME WITHIN EACH YEAR
    by year (income), sort: gen decile = ceil(_n/10)
    
    //    NORMALIZE WEIGHTS WITHIN YEARS, NOT ACROSS THE ENTIRE DATA SET
    by year, sort: egen weight_total = total(weight)
    by year: gen year_normalized_weight = _N*weight/weight_total
    
    //    NOW COLLAPSE USING THESE NORMALIZED WEIGHTS AS IWEIGHTS
    collapse (sum) income [iweight = year_normalized_weight], by(year decile)
    Added: Note that this approach works because -collapse- does not normalize or otherwise tamper with iweights when computing sums, so the hand-coded normalization by year done before -collapse- is respected.
    Last edited by Clyde Schechter; 25 Sep 2017, 22:51.

    Comment

    Working...
    X