Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Rolling 10-year age intervals

    Hello,

    I am trying to create a variable called "mean reference wage" which is equal to the mean wage of everyone in a specified category (using the egen command). To specify the category I use people of similar age using the bysort command. Thus far I have managed to create 5 age groups of 10 years each and bysorted using age group. For example, people who are 21-30 will be in the same category, 31- 40, 41-50, etc. Using bysort and egen I therefore get the mean wage of people in each age group.

    Now instead of that I would like to define the age group such that people compare themselves to individuals who are up to 3 years younger and up to 6 years older. So someone who is 29 would care about the mean wage of people between the ages of 26 and 35 (not: 21-30 as is the case now). Is there any way of doing that?

    Thank you!

  • #2
    Search the forum for mentions of rangestat from SSC. Your command might resemble

    Code:
    rangestat refmean = wage, interval(age -3 +6) by(somecatvar) excludeself

    Comment


    • #3
      Thank you!! That was very useful. Do you know whether it would be possible to use the rangestat command (in a way similar to the command you wrote above) to calculate the mean wage of people in the comparison group but only of the people whose wage is higher than yours (and then do something similar to calculate the mean wage of those below your wage)?

      Comment


      • #4
        Yes, I do know the answer.... This follows immediately from reading the help for rangestat, including its examples.

        Comment


        • #5
          AYou can do that with the same command, it's just a matter of setting the appropriate interval. So to identify an interval that includes all wages higher than yours, you need to find a value of wage that is higher than any wage in the data set as its upper bound, and also the lowest wage that exceeds the index wage. If you are familiar with your data set, you can probably pick those out easily. You may know, for example, that your wage variable is coded in whole dollars (euros, pounds, whatever) and so if my wage is X, then any higher wage is at least X+1, and you may know that the highest wage in the data set is less than 1 billion currency units. Similarly for all lower wages, it is might be that the lowest possible wage can't be negative, so zero would be a suitable lower bound. Those are the easy cases. If you can't pull the upper and lower bounds out of thin air, then you can calculate them as follows:

          Code:
          summ wage, meanonly
          local lowest = r(min)
          local highest = r(max)
          
          sort wage
          gen delta = wage - wage[_n-1]
          summ delta if delta > 0, meanonly
          local mesh = r(min)
          
          // FIND MEAN OF ALL WAGES HIGHER
          gen lower_bound = wage +`mesh'
          gen upper_bound = `highest'
          rangestat mean_higher_wages = wage, interval(wage lower_bound upper_bound)
          
          // FIND MEAN OF ALL WAGES LOWER
          replace upper_bound = wage - `mesh'
          replace lower_bound = lowest'
          rangestat mean_higerh_wages = wage, interval(wage lower_bound upper_bound)
          Evidently, if you just know the appropriate upper and lower bounds, just stick those in there without calculating them as variables.

          Added: crossed with Nick's response. Also corrected typo.

          Comment


          • #6
            Clyde is right. In practice pulling +/- terms out of the air that you know lie beyond empirical extremes is fine too.

            Code:
             
             rangestat refmean = wage, interval(wage 0 1e8) by(somecatvar) excludeself

            Comment

            Working...
            X