Rolling 10-year age intervals

Filippo Colonna

Join Date: Nov 2016

Posts: 6
#1

Rolling 10-year age intervals

17 Mar 2017, 12:55

Hello,

I am trying to create a variable called "mean reference wage" which is equal to the mean wage of everyone in a specified category (using the egen command). To specify the category I use people of similar age using the bysort command. Thus far I have managed to create 5 age groups of 10 years each and bysorted using age group. For example, people who are 21-30 will be in the same category, 31- 40, 41-50, etc. Using bysort and egen I therefore get the mean wage of people in each age group.

Now instead of that I would like to define the age group such that people compare themselves to individuals who are up to 3 years younger and up to 6 years older. So someone who is 29 would care about the mean wage of people between the ages of 26 and 35 (not: 21-30 as is the case now). Is there any way of doing that?

Thank you!
Tags: None
Nick Cox

Join Date: Mar 2014

Posts: 35699
#2

17 Mar 2017, 13:51

Search the forum for mentions of rangestat from SSC. Your command might resemble

Code:

rangestat refmean = wage, interval(age -3 +6) by(somecatvar) excludeself
Comment
Filippo Colonna

Join Date: Nov 2016

Posts: 6
#3

17 Mar 2017, 14:32

Thank you!! That was very useful. Do you know whether it would be possible to use the rangestat command (in a way similar to the command you wrote above) to calculate the mean wage of people in the comparison group but only of the people whose wage is higher than yours (and then do something similar to calculate the mean wage of those below your wage)?
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#4

17 Mar 2017, 15:29

Yes, I do know the answer.... This follows immediately from reading the help for rangestat, including its examples.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#5

17 Mar 2017, 15:31

AYou can do that with the same command, it's just a matter of setting the appropriate interval. So to identify an interval that includes all wages higher than yours, you need to find a value of wage that is higher than any wage in the data set as its upper bound, and also the lowest wage that exceeds the index wage. If you are familiar with your data set, you can probably pick those out easily. You may know, for example, that your wage variable is coded in whole dollars (euros, pounds, whatever) and so if my wage is X, then any higher wage is at least X+1, and you may know that the highest wage in the data set is less than 1 billion currency units. Similarly for all lower wages, it is might be that the lowest possible wage can't be negative, so zero would be a suitable lower bound. Those are the easy cases. If you can't pull the upper and lower bounds out of thin air, then you can calculate them as follows:

Code:

summ wage, meanonly local lowest = r(min) local highest = r(max) sort wage gen delta = wage - wage[_n-1] summ delta if delta > 0, meanonly local mesh = r(min) // FIND MEAN OF ALL WAGES HIGHER gen lower_bound = wage +`mesh' gen upper_bound = `highest' rangestat mean_higher_wages = wage, interval(wage lower_bound upper_bound) // FIND MEAN OF ALL WAGES LOWER replace upper_bound = wage - `mesh' replace lower_bound = lowest' rangestat mean_higerh_wages = wage, interval(wage lower_bound upper_bound)

Evidently, if you just know the appropriate upper and lower bounds, just stick those in there without calculating them as variables.

Added: crossed with Nick's response. Also corrected typo.
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#6

18 Mar 2017, 02:23

Clyde is right. In practice pulling +/- terms out of the air that you know lie beyond empirical extremes is fine too.

Code:

rangestat refmean = wage, interval(wage 0 1e8) by(somecatvar) excludeself
Comment

Announcement

Rolling 10-year age intervals

Comment

Comment

Comment

Comment

Comment