Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to find the mean position of females in the male wage distribution?

    Hi all,
    I am dealing with the problem of gender wage gap. It is described in some paper that one measure of wage gap is to present the position of females in the males wage distribution. The way is to assign each woman a percentile in the male wage distribution and to average all these percentile rankings. How to realize this?
    I think I can use cumul command to describe male wage distribution, but I really don't know how to find out each woman's position in this cumulative distribution. Can anyone help? Thanks!

    Xiaoyang
    Last edited by Xiaoyang Xu; 17 Nov 2017, 08:40.

  • #2
    Here's an example of how this can be done:

    Code:
    clear*
    webuse hsng
    
    //    CREATE A "SEX" VARIABLE JUST FOR ILLUSTRATION
    gen byte female = mod(region, 2)
    
    tabstat faminc, by(female)
    
    //    CALCULATE CUMULATIVE DISTRIBUTION OF MALES
    cumul faminc if !female, gen(male_distrib)
    
    //    LOCATE FEMALES WITHIN MALE DISTRIBUTION
    sort faminc
    ipolate male_distrib faminc, epolate gen(overall)
    replace overall = max(min(overall, 1), 0)
    
    summ overall if female
    Note that there can be (and in this example there are) values of family income among females that fall outside the observed range of the male distribution. Extrapolation can lead to values outside the 0-1 range for these, so the code simply replaces negatives by 0 and results > 1 by 1. A slightly more "sophisticated" approach would be to use a logit transformation before interpolation and then back-transform after:

    Code:
    clear*
    webuse hsng
    
    //    CREATE A "SEX" VARIABLE JUST FOR ILLUSTRATION
    gen byte female = mod(region, 2)
    
    tabstat faminc, by(female)
    
    //    CALCULATE CUMULATIVE DISTRIBUTION OF MALES
    cumul faminc if !female, gen(male_distrib)
    
    //    LOCATE FEMALES WITHIN MALE DISTRIBUTION
    gen logit_male_distrib = logit(male_distrib)
    sort faminc
    ipolate logit_male_distrib faminc, epolate gen(logit_overall)
    gen overall = invlogit(logit_overall)
    
    summ overall if female
    In truth, when there are female incomes outside the observed male distribution's range, you are reduced to making some kind of assumption about where they would be if you had a more complete sample of males, and in the absence of external information about the shape of the distribution or something like that, you are just flying blind on these observations. If there are many of them, there is a problem. If there are only a few, then probably their influence on the result will be minimal.

    Comment


    • #3
      relrank by Ben Jann on SSC may be your friend: ssc install relrank. And note also the literature cited in the help file.

      Comment


      • #4
        Originally posted by Clyde Schechter View Post
        Here's an example of how this can be done:

        Code:
        clear*
        webuse hsng
        
        // CREATE A "SEX" VARIABLE JUST FOR ILLUSTRATION
        gen byte female = mod(region, 2)
        
        tabstat faminc, by(female)
        
        // CALCULATE CUMULATIVE DISTRIBUTION OF MALES
        cumul faminc if !female, gen(male_distrib)
        
        // LOCATE FEMALES WITHIN MALE DISTRIBUTION
        sort faminc
        ipolate male_distrib faminc, epolate gen(overall)
        replace overall = max(min(overall, 1), 0)
        
        summ overall if female
        Note that there can be (and in this example there are) values of family income among females that fall outside the observed range of the male distribution. Extrapolation can lead to values outside the 0-1 range for these, so the code simply replaces negatives by 0 and results > 1 by 1. A slightly more "sophisticated" approach would be to use a logit transformation before interpolation and then back-transform after:

        Code:
        clear*
        webuse hsng
        
        // CREATE A "SEX" VARIABLE JUST FOR ILLUSTRATION
        gen byte female = mod(region, 2)
        
        tabstat faminc, by(female)
        
        // CALCULATE CUMULATIVE DISTRIBUTION OF MALES
        cumul faminc if !female, gen(male_distrib)
        
        // LOCATE FEMALES WITHIN MALE DISTRIBUTION
        gen logit_male_distrib = logit(male_distrib)
        sort faminc
        ipolate logit_male_distrib faminc, epolate gen(logit_overall)
        gen overall = invlogit(logit_overall)
        
        summ overall if female
        In truth, when there are female incomes outside the observed male distribution's range, you are reduced to making some kind of assumption about where they would be if you had a more complete sample of males, and in the absence of external information about the shape of the distribution or something like that, you are just flying blind on these observations. If there are many of them, there is a problem. If there are only a few, then probably their influence on the result will be minimal.
        Clyde,thank you! I got the expected result by applying your code.

        Comment


        • #5
          Originally posted by Stephen Jenkins View Post
          relrank by Ben Jann on SSC may be your friend: ssc install relrank. And note also the literature cited in the help file.
          Thank you, Stehpen. This command is very helpful as well!

          Comment

          Working...
          X