Hello. I am using the "oaxaca" package from Ben Jann and I've read the related Stata Journal article (https://journals.sagepub.com/doi/pdf...867X0800800401). I'm using Stata/MP 16.1. Data are from the American Community Survey.
Goal is to explain differences in the log hourly wages of men and women (male=1 and female=2). Below is output from a two-fold decomposition both with and without exponentiated coefficients (eform option).
Questions:
1. Should the values of "explained" and "unexplained" add up to the value of the difference when using exponentiated coefficients? If they do not, why might that be? In the Stata Journal article about oaxaca, in the examples of two-fold decomposition (including the example using exponentiated coefficients), the values of "explained" and "unexplained" sum to the difference. In this case, they do in the version without exponentiated coefficients, but not in the exponentiated version (17.92% unexplained plus negative 8.2% explained does not equal 8% difference, though it's close).
2. The value of "unexplained" is larger than the value of the difference. How should I interpret that?
3. The value of "explained" seems to suggest that, if women had the same values of the other predictors as men, their average earnings would be about 8% less than they currently are (exponentiated coeficients). How should I interpret this? As evidence that the effect of the predictors on wages is different for women than for men?
WITH EXPONENTIATED COEFFICIENTS
. oaxaca ln_hourly race_eth3 age age_sq marital2 has_child edatt4 stem2, by(sex) eform pooled s
> vy(,subpop(if $allstemfocal))
(running oaxaca on estimation sample)
BRR replications (80)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
..............................
Blinder-Oaxaca decomposition Number of obs = 373,559
Population size = 7,496,874
Subpop. no. obs = 7,545
Subpop. size = 168,977
Replications = 80
Design df = 79
------------------------------------------------------------------------------
| BRR *
ln_hourly | exp(b) Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
overall |
group_1 | 28.05908 .4327067 216.22 0.000 27.21088 28.93371
group_2 | 25.93025 .4280831 197.19 0.000 25.09202 26.79648
difference | 1.082098 .0233501 3.66 0.000 1.036605 1.129588
explained | .917648 .0113864 -6.93 0.000 .8952616 .9405942
unexplained | 1.179209 .0204376 9.51 0.000 1.139222 1.220598
-------------+----------------------------------------------------------------
explained |
race_eth3 | -.0082964 .0020902 -3.97 0.000 -.0124568 -.0041361
age | .0324024 .0244359 1.33 0.189 -.016236 .0810408
age_sq | -.0272928 .0206412 -1.32 0.190 -.068378 .0137925
marital2 | .0124589 .0026078 4.78 0.000 .0072683 .0176496
has_child | .001147 .000957 1.20 0.234 -.0007578 .0030518
edatt4 | -.0795574 .0063336 -12.56 0.000 -.0921642 -.0669507
stem2 | -.0168032 .0044118 -3.81 0.000 -.0255847 -.0080217
-------------+----------------------------------------------------------------
unexplained |
race_eth3 | -.1061984 .0314907 -3.37 0.001 -.1688791 -.0435178
age | -.4272993 .529228 -0.81 0.422 -1.480701 .6261027
age_sq | .2249712 .2747018 0.82 0.415 -.321809 .7717514
marital2 | .1002231 .0658463 1.52 0.132 -.0308407 .231287
has_child | .0379331 .0186572 2.03 0.045 .000797 .0750693
edatt4 | -.0093948 .0523455 -0.18 0.858 -.1135859 .0947963
stem2 | -.0844023 .053803 -1.57 0.121 -.1914946 .0226899
_cons | .4290109 .2617127 1.64 0.105 -.0919152 .949937
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
WITHOUT EXPONENTIATED COEFFICIENTS
. oaxaca ln_hourly race_eth3 age age_sq marital2 has_child edatt4 stem2, by(sex) pooled svy(,su
> bpop(if $allstemfocal))
(running oaxaca on estimation sample)
BRR replications (80)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
..............................
Blinder-Oaxaca decomposition Number of obs = 373,559
Population size = 7,496,874
Subpop. no. obs = 7,545
Subpop. size = 168,977
Replications = 80
Design df = 79
------------------------------------------------------------------------------
| BRR *
ln_hourly | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
overall |
group_1 | 3.334312 .0154213 216.22 0.000 3.303617 3.365007
group_2 | 3.25541 .016509 197.19 0.000 3.22255 3.28827
difference | .0789021 .0215785 3.66 0.000 .0359511 .121853
explained | -.0859414 .0124082 -6.93 0.000 -.1106394 -.0612435
unexplained | .1648435 .0173316 9.51 0.000 .1303458 .1993413
-------------+----------------------------------------------------------------
explained |
race_eth3 | -.0082964 .0020902 -3.97 0.000 -.0124568 -.0041361
age | .0324024 .0244359 1.33 0.189 -.016236 .0810408
age_sq | -.0272928 .0206412 -1.32 0.190 -.068378 .0137925
marital2 | .0124589 .0026078 4.78 0.000 .0072683 .0176496
has_child | .001147 .000957 1.20 0.234 -.0007578 .0030518
edatt4 | -.0795574 .0063336 -12.56 0.000 -.0921642 -.0669507
stem2 | -.0168032 .0044118 -3.81 0.000 -.0255847 -.0080217
-------------+----------------------------------------------------------------
unexplained |
race_eth3 | -.1061984 .0314907 -3.37 0.001 -.1688791 -.0435178
age | -.4272993 .529228 -0.81 0.422 -1.480701 .6261027
age_sq | .2249712 .2747018 0.82 0.415 -.321809 .7717514
marital2 | .1002231 .0658463 1.52 0.132 -.0308407 .231287
has_child | .0379331 .0186572 2.03 0.045 .000797 .0750693
edatt4 | -.0093948 .0523455 -0.18 0.858 -.1135859 .0947963
stem2 | -.0844023 .053803 -1.57 0.121 -.1914946 .0226899
_cons | .4290109 .2617127 1.64 0.105 -.0919152 .949937
------------------------------------------------------------------------------
Goal is to explain differences in the log hourly wages of men and women (male=1 and female=2). Below is output from a two-fold decomposition both with and without exponentiated coefficients (eform option).
Questions:
1. Should the values of "explained" and "unexplained" add up to the value of the difference when using exponentiated coefficients? If they do not, why might that be? In the Stata Journal article about oaxaca, in the examples of two-fold decomposition (including the example using exponentiated coefficients), the values of "explained" and "unexplained" sum to the difference. In this case, they do in the version without exponentiated coefficients, but not in the exponentiated version (17.92% unexplained plus negative 8.2% explained does not equal 8% difference, though it's close).
2. The value of "unexplained" is larger than the value of the difference. How should I interpret that?
3. The value of "explained" seems to suggest that, if women had the same values of the other predictors as men, their average earnings would be about 8% less than they currently are (exponentiated coeficients). How should I interpret this? As evidence that the effect of the predictors on wages is different for women than for men?
WITH EXPONENTIATED COEFFICIENTS
. oaxaca ln_hourly race_eth3 age age_sq marital2 has_child edatt4 stem2, by(sex) eform pooled s
> vy(,subpop(if $allstemfocal))
(running oaxaca on estimation sample)
BRR replications (80)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
..............................
Blinder-Oaxaca decomposition Number of obs = 373,559
Population size = 7,496,874
Subpop. no. obs = 7,545
Subpop. size = 168,977
Replications = 80
Design df = 79
------------------------------------------------------------------------------
| BRR *
ln_hourly | exp(b) Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
overall |
group_1 | 28.05908 .4327067 216.22 0.000 27.21088 28.93371
group_2 | 25.93025 .4280831 197.19 0.000 25.09202 26.79648
difference | 1.082098 .0233501 3.66 0.000 1.036605 1.129588
explained | .917648 .0113864 -6.93 0.000 .8952616 .9405942
unexplained | 1.179209 .0204376 9.51 0.000 1.139222 1.220598
-------------+----------------------------------------------------------------
explained |
race_eth3 | -.0082964 .0020902 -3.97 0.000 -.0124568 -.0041361
age | .0324024 .0244359 1.33 0.189 -.016236 .0810408
age_sq | -.0272928 .0206412 -1.32 0.190 -.068378 .0137925
marital2 | .0124589 .0026078 4.78 0.000 .0072683 .0176496
has_child | .001147 .000957 1.20 0.234 -.0007578 .0030518
edatt4 | -.0795574 .0063336 -12.56 0.000 -.0921642 -.0669507
stem2 | -.0168032 .0044118 -3.81 0.000 -.0255847 -.0080217
-------------+----------------------------------------------------------------
unexplained |
race_eth3 | -.1061984 .0314907 -3.37 0.001 -.1688791 -.0435178
age | -.4272993 .529228 -0.81 0.422 -1.480701 .6261027
age_sq | .2249712 .2747018 0.82 0.415 -.321809 .7717514
marital2 | .1002231 .0658463 1.52 0.132 -.0308407 .231287
has_child | .0379331 .0186572 2.03 0.045 .000797 .0750693
edatt4 | -.0093948 .0523455 -0.18 0.858 -.1135859 .0947963
stem2 | -.0844023 .053803 -1.57 0.121 -.1914946 .0226899
_cons | .4290109 .2617127 1.64 0.105 -.0919152 .949937
------------------------------------------------------------------------------
Note: Estimates are transformed only in the first equation.
WITHOUT EXPONENTIATED COEFFICIENTS
. oaxaca ln_hourly race_eth3 age age_sq marital2 has_child edatt4 stem2, by(sex) pooled svy(,su
> bpop(if $allstemfocal))
(running oaxaca on estimation sample)
BRR replications (80)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5
.................................................. 50
..............................
Blinder-Oaxaca decomposition Number of obs = 373,559
Population size = 7,496,874
Subpop. no. obs = 7,545
Subpop. size = 168,977
Replications = 80
Design df = 79
------------------------------------------------------------------------------
| BRR *
ln_hourly | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
overall |
group_1 | 3.334312 .0154213 216.22 0.000 3.303617 3.365007
group_2 | 3.25541 .016509 197.19 0.000 3.22255 3.28827
difference | .0789021 .0215785 3.66 0.000 .0359511 .121853
explained | -.0859414 .0124082 -6.93 0.000 -.1106394 -.0612435
unexplained | .1648435 .0173316 9.51 0.000 .1303458 .1993413
-------------+----------------------------------------------------------------
explained |
race_eth3 | -.0082964 .0020902 -3.97 0.000 -.0124568 -.0041361
age | .0324024 .0244359 1.33 0.189 -.016236 .0810408
age_sq | -.0272928 .0206412 -1.32 0.190 -.068378 .0137925
marital2 | .0124589 .0026078 4.78 0.000 .0072683 .0176496
has_child | .001147 .000957 1.20 0.234 -.0007578 .0030518
edatt4 | -.0795574 .0063336 -12.56 0.000 -.0921642 -.0669507
stem2 | -.0168032 .0044118 -3.81 0.000 -.0255847 -.0080217
-------------+----------------------------------------------------------------
unexplained |
race_eth3 | -.1061984 .0314907 -3.37 0.001 -.1688791 -.0435178
age | -.4272993 .529228 -0.81 0.422 -1.480701 .6261027
age_sq | .2249712 .2747018 0.82 0.415 -.321809 .7717514
marital2 | .1002231 .0658463 1.52 0.132 -.0308407 .231287
has_child | .0379331 .0186572 2.03 0.045 .000797 .0750693
edatt4 | -.0093948 .0523455 -0.18 0.858 -.1135859 .0947963
stem2 | -.0844023 .053803 -1.57 0.121 -.1914946 .0226899
_cons | .4290109 .2617127 1.64 0.105 -.0919152 .949937
------------------------------------------------------------------------------
Comment