How to find the mean position of females in the male wage distribution?

Xiaoyang Xu

Join Date: Nov 2017

Posts: 3
#1

How to find the mean position of females in the male wage distribution?

17 Nov 2017, 08:30

Hi all,
I am dealing with the problem of gender wage gap. It is described in some paper that one measure of wage gap is to present the position of females in the males wage distribution. The way is to assign each woman a percentile in the male wage distribution and to average all these percentile rankings. How to realize this?
I think I can use cumul command to describe male wage distribution, but I really don't know how to find out each woman's position in this cumulative distribution. Can anyone help? Thanks!

Xiaoyang

Last edited by Xiaoyang Xu; 17 Nov 2017, 08:40.
Tags: distribution, percentile
Clyde Schechter

Join Date: Apr 2014

Posts: 30089
#2

17 Nov 2017, 08:57

Here's an example of how this can be done:

Code:

clear* webuse hsng // CREATE A "SEX" VARIABLE JUST FOR ILLUSTRATION gen byte female = mod(region, 2) tabstat faminc, by(female) // CALCULATE CUMULATIVE DISTRIBUTION OF MALES cumul faminc if !female, gen(male_distrib) // LOCATE FEMALES WITHIN MALE DISTRIBUTION sort faminc ipolate male_distrib faminc, epolate gen(overall) replace overall = max(min(overall, 1), 0) summ overall if female

Note that there can be (and in this example there are) values of family income among females that fall outside the observed range of the male distribution. Extrapolation can lead to values outside the 0-1 range for these, so the code simply replaces negatives by 0 and results > 1 by 1. A slightly more "sophisticated" approach would be to use a logit transformation before interpolation and then back-transform after:

Code:

clear* webuse hsng // CREATE A "SEX" VARIABLE JUST FOR ILLUSTRATION gen byte female = mod(region, 2) tabstat faminc, by(female) // CALCULATE CUMULATIVE DISTRIBUTION OF MALES cumul faminc if !female, gen(male_distrib) // LOCATE FEMALES WITHIN MALE DISTRIBUTION gen logit_male_distrib = logit(male_distrib) sort faminc ipolate logit_male_distrib faminc, epolate gen(logit_overall) gen overall = invlogit(logit_overall) summ overall if female

In truth, when there are female incomes outside the observed male distribution's range, you are reduced to making some kind of assumption about where they would be if you had a more complete sample of males, and in the absence of external information about the shape of the distribution or something like that, you are just flying blind on these observations. If there are many of them, there is a problem. If there are only a few, then probably their influence on the result will be minimal.
1 like
Comment
Stephen Jenkins

Join Date: Apr 2014

Posts: 1435
#3

17 Nov 2017, 09:04

relrank by Ben Jann on SSC may be your friend: ssc install relrank. And note also the literature cited in the help file.
Comment
Xiaoyang Xu

Join Date: Nov 2017

Posts: 3
#4

17 Nov 2017, 11:26

Originally posted by Clyde Schechter View Post

Here's an example of how this can be done:

Code:

clear* webuse hsng // CREATE A "SEX" VARIABLE JUST FOR ILLUSTRATION gen byte female = mod(region, 2) tabstat faminc, by(female) // CALCULATE CUMULATIVE DISTRIBUTION OF MALES cumul faminc if !female, gen(male_distrib) // LOCATE FEMALES WITHIN MALE DISTRIBUTION sort faminc ipolate male_distrib faminc, epolate gen(overall) replace overall = max(min(overall, 1), 0) summ overall if female

Note that there can be (and in this example there are) values of family income among females that fall outside the observed range of the male distribution. Extrapolation can lead to values outside the 0-1 range for these, so the code simply replaces negatives by 0 and results > 1 by 1. A slightly more "sophisticated" approach would be to use a logit transformation before interpolation and then back-transform after:

Code:

clear* webuse hsng // CREATE A "SEX" VARIABLE JUST FOR ILLUSTRATION gen byte female = mod(region, 2) tabstat faminc, by(female) // CALCULATE CUMULATIVE DISTRIBUTION OF MALES cumul faminc if !female, gen(male_distrib) // LOCATE FEMALES WITHIN MALE DISTRIBUTION gen logit_male_distrib = logit(male_distrib) sort faminc ipolate logit_male_distrib faminc, epolate gen(logit_overall) gen overall = invlogit(logit_overall) summ overall if female

In truth, when there are female incomes outside the observed male distribution's range, you are reduced to making some kind of assumption about where they would be if you had a more complete sample of males, and in the absence of external information about the shape of the distribution or something like that, you are just flying blind on these observations. If there are many of them, there is a problem. If there are only a few, then probably their influence on the result will be minimal.

Clyde,thank you! I got the expected result by applying your code.
Comment
Xiaoyang Xu

Join Date: Nov 2017

Posts: 3
#5

17 Nov 2017, 12:17

Originally posted by Stephen Jenkins View Post

relrank by Ben Jann on SSC may be your friend: ssc install relrank. And note also the literature cited in the help file.

Thank you, Stehpen. This command is very helpful as well!
Comment

Announcement

How to find the mean position of females in the male wage distribution?

Comment

Comment

Comment

Comment