Looping running sum and total for changing range within grouped variables.

Moritz Huth

Join Date: Mar 2023
Posts: 14

Looping running sum and total for changing range within grouped variables.

08 Mar 2023, 13:24

Hi everyone,

I'm relatively new to Stata so please excuse if the explanation of my problem sounds a bit all over the place. I will try my best:

Here is what my data looks like:

Code:

   rank
percent
weight

1
1
2000

2
1
3000

3
2
7000

4
2
1200

5
3
3000

6
3
4500


 end

What I want to do is build a loop that for the first group, percent = 1
- generates a variable as the cumulative sum of weight of each observation
- generates a variable as the total sum of weight in that group

And for the second group, percent = 2
- generates a variable as the cumulative sum of each observation from percent = 1 and percent = 2 by the cumulative total of the weights from Perc 1 and 2,
- generates a variable as the total sum of weight from percent = 1 and percent = 2 and so on

In my data the values for percent range from 1-100. However, the loop should only address the first 10 groups, i.e. the first 10 percentile thresholds.

Here is what I tried in Stata, but the results doesn't seem right:

Code:

local k = 1
foreach var of varlist percent_* {
    foreach x in percent_`k' {
        bysort percent_`k' (rank`k'): gen ni_`k' = sum(weight) if `x' == `x'[_n+1]
        bysort percent_`k': egen nall_`k' = total(weight) if `x' == `x'[_n+1]
    }
    local k = `k' + 1
}
end

To give an example of what I'm looking for, eventually it should look like this:

Code:

Any help is much appreciated.

Thank you
Moritz

Last edited by Moritz Huth; 08 Mar 2023, 13:27. Reason: Spelling

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30117
#2

08 Mar 2023, 17:17

The results you show are clearer than your explanation in words, so thank you for that. But they are also a very un-Stata-ish way of organizing information. I wonder what you are planning to do with these Ni and Nall variables. Perhaps if your ultimate purpose were clear, a better approach could be suggested. I say that because the arrangement with N1, Nall1, N2, Nall2, etc. is highly redundant. There is very little in the way of further data management or calculations that could be done with this in Stata, as far as I can see.

So, the following code simply generates two variables: running_sum, and group_total. All of the numbers you wanted calculated are there, although not arranged in the way you showed. It is not that hard to create the other variables from these, but as I can't see how it will be helpful to do so, I have not written that part of the code.

Code:

* Example generated by -dataex-. For more info, type help dataex clear input int(rank percent) float weight 1 1 2000 2 1 3000 3 2 7000 4 2 1200 5 3 3000 6 3 4500 end gen running_sum = sum(weight) by percent (rank), sort: gen group_total = running_sum[_N]

In the future, when showing data examples, please use the -dataex- command to do so, as I have done here. If you are running version 17, 16 or a fully updated version 15.1 or 14.2, -dataex- is already part of your official Stata installation. If not, run -ssc install dataex- to get it. Either way, run -help dataex- to read the simple instructions for using it. -dataex- will save you time; it is easier and quicker than typing out tables. It includes complete information about aspects of the data that are often critical to answering your question but cannot be seen from tabular displays or screenshots. It also makes it possible for those who want to help you to create a faithful representation of your example to try out their code, which in turn makes it more likely that their answer will actually work in your data.
Comment
Moritz Huth

Join Date: Mar 2023

Posts: 14
#3

09 Mar 2023, 04:56

Thank you for your response, Clyde. I will keep your suggestions towards -dataex- in mind.

Here is some context:
I have an imputed panel dataset on wealth distribution on a household level for which I want to fit a Pareto tail to. To do so, in a first step I need to estimate the distribution function based on my data using OLS, to identify the shape parameter alpha. The regression equation would look like this: ln(CCDF) = ß0 - ß1 ln(wealth)

The CCDF is calculated by the running sum of weights (ni) and the total sum of weights (nall) = ni/nall. So in my original post I wanted to calculate the CCDF for the first 10 percentiles of the distribution using a loop. But not for each percentile individually but by accumulating them for every new variable produced by the loop.

So like in my example above the ni, nall and CCDF should be:
Percentile 1: {2000, 3000} -> ni1/nall1 -> {2000/5000, 5000/5000} = CCDF1 = {0.4, 1}
Percentile 2 : {2000, 3000, 7000, 1200} -> ni2/nall2 -> {2000/13200, 5000/13200, 12000/13200, 13200/13200} = CCDF2 = {0.15, 0.38, 0.9, 1}, and so on until the 10th percentile.

In an earlier attempt I tried the following, but this loop only calculates the running sum and total sum for each percentile individually:

Code:

local i = 1 foreach var of varlist *_net_wealth { sort `var' egen ranknw_`i' = rank(-`var'), unique xtile percent_`i' = -`var' [weight = weight], nquantiles(100) gen log_nw_`i' = log(`var') local i = `i' + 1 } local j = 1 foreach var of varlist percent_* { foreach x in percent_`j' { bysort percent_`j' (ranknw_`j'): gen ni_`j' = sum(weight) if `x' <= 10 bysort percent_`j' (ranknw_`j'): egen nall_`j' = total(weight) if `x' <= 10 gen ccdf_`j' = ni_`j'/nall_`j' gen log_ccdf_`j' = log(ccdf_`j') } local j = `j' + 1 } forvalues p = 1/5 { bysort pct_percent_`p': eststo: quietly reg log_ccdf_`p' log_nw_`p' if pct_percent_`p' <= 10 }

I hope this helps to better understand my problem and doesn't cause even more confusion haha

Thank you
Moritz

Last edited by Moritz Huth; 09 Mar 2023, 05:14.
Comment

Moritz Huth

Join Date: Mar 2023
Posts: 14

09 Mar 2023, 06:31

For creating ni and nall for the cumulated percentiles, here ist what the code would look like without the loop:

Code:

sort rank
gen ni_1 = sum(weight) if percent == 1
egen nall_1 = total(weight) if percent == 1

gen ni_2 = sum(weight) if percent <= 2
egen nall_2 = total(weight) if percent <= 2

...

gen ni_10 = sum(weight) if percent <= 10
egen nall_10 = total(weight) if percent <= 10

But unfortunately I'm not able to create the loop. Any ideas?

Thank you
Moritz

Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30117

09 Mar 2023, 10:57

OK, I get what you are trying to do.

Code:

* Example generated by -dataex-. For more info, type help dataex
clear
input int(rank percent) float weight
1 1 2000
2 1 3000
3 2 7000
4 2 1200
5 3 3000
6 3 4500
end

gen running_sum = sum(weight)
by percent (rank), sort: gen group_total = running_sum[_N]

forvalues i = 1/3 {
    gen ni_`i' = running_sum if percent <= `i'
    egen n_all_`i' = min(cond(percent == `i', group_total, .))
}

Comment

Moritz Huth

Join Date: Mar 2023

Posts: 14
#6

10 Mar 2023, 08:12

This worked out perfectly, thank you!
Comment

Announcement

Looping running sum and total for changing range within grouped variables.

Comment

Comment

Comment

Comment

Comment