Oaxaca Decomposition Interpretation

Seyda Coban

Join Date: Dec 2017

Posts: 6
#1

Oaxaca Decomposition Interpretation

23 Dec 2017, 07:07

Hello everyone,

I am expected to apply Oaxaca Decomposition method for my graduation project. The subject is wage discrimination based on gender. I applied it on Stata and got the results (both for threefold and twofold), but there are two things that I am confused about.

1) When should I apply twofold and threefold? What's the difference between them? I couldn't get the detailed decomposition when I applied threefold decomposition, that's why I applied twofold instead. does this create a problem? I mean what should I say to my advisor when he asks me why twofold instead of threefold? I need a reasonable explanation for that.

2) Now I got the results for twofold decomposition, but I don't know how to interpret them. What does the minus sign mean in explained part?

The stata output related to decomposition is attached below. Is there anyone who can help me with the interpretation? It's a bit urgent.

Any help would be greatly appreciated!
Attached Files

ATTACHED.docx (1,011.7 KB, 1 view)

Last edited by Seyda Coban; 23 Dec 2017, 07:32.
Tags: genderpaygap, genderwagegap, laboreconomics, oaxaca, oaxacadecomposition
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17714
#2

23 Dec 2017, 07:45

Seyda:
you might be interested in http://www.worldbank.org/en/topic/he...ld-survey-data, Chapter 12.
Chapter 12 has its own .do file available at: http://siteresources.worldbank.org/I...69249/AHE12.do.

Kind regards,
Carlo
(Stata 19.0)
Comment
Seyda Coban

Join Date: Dec 2017

Posts: 6
#3

23 Dec 2017, 08:14

Thanks Carlo! I will check it now. But I think this source doesn't mention twofold and threefold types of the decomposition. Do you have any information about my first question? What should I tell my advisor when he asks me about the reason for choosing twofold instead of threefold?
Comment
Seyda Coban

Join Date: Dec 2017

Posts: 6
#4

23 Dec 2017, 08:18

Originally posted by Carlo Lazzaro View Post

Seyda:
you might be interested in http://www.worldbank.org/en/topic/he...ld-survey-data, Chapter 12.
Chapter 12 has its own .do file available at: http://siteresources.worldbank.org/I...69249/AHE12.do.

Thanks Carlo! I will check it now. But I think this source doesn't mention twofold and threefold types of the decomposition. Do you have any information about my first question? What should I tell my advisor when he asks me about the reason for choosing twofold instead of threefold?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

23 Dec 2017, 08:57

Originally posted by Seyda Coban View Post

Hello everyone,

I am expected to apply Oaxaca Decomposition method for my graduation project. The subject is wage discrimination based on gender. I applied it on Stata and got the results (both for threefold and twofold), but there are two things that I am confused about.

1) When should I apply twofold and threefold? What's the difference between them? I couldn't get the detailed decomposition when I applied threefold decomposition, that's why I applied twofold instead. does this create a problem? I mean what should I say to my advisor when he asks me why twofold instead of threefold? I need a reasonable explanation for that.

2) Now I got the results for twofold decomposition, but I don't know how to interpret them. What does the minus sign mean in explained part?

The stata output related to decomposition is attached below. Is there anyone who can help me with the interpretation? It's a bit urgent.

Any help would be greatly appreciated!

Seyda,

Per the FAQ, Statalist policy is not to post Microsoft Word or Excel files; not everyone has Microsoft, and moreover some people have organizational policies that forbid them from opening such files for fear of viruses. It's not you, it's the world we live in.

I'll assume you used Ben Jann's -oaxaca- command, available on SSC. He also has a Stata Journal article explaining the differences between twofold and threefold decomposition. Now, there is some matrix algebra in there, and it was a bit obtuse to me at first. But, here is the explanation I came up with for my own use.

Whatever type of decomposition, you often find part of whatever disparity is involved is due to disparities in the levels of observed variables. (Which you probably already knew.) In twofold decomposition, you are saying that if the observed variables have the same effect in each group (e.g. if the returns to education are identical in women and men), it would explain x% of the observed disparity.

In the threefold decomp, you are also attempting to estimate the effect of a disparity in returns to the observed covariates. In the twofold decomp, this disparity (if it exists) gets folded into the "unexplained" portion of the disparity.

Why choose one or the other? I am not familiar with the technique enough to say. I only used this once for a particular project; my economist colleague recommend twofold decomposition, and I forget why he said he'd prefer twofold in this case. We did have a lot of explanatory variables, and we had no a priori reason to think that most of them had differential effects in both groups. So, it could be that. That said, the oaxaca command doesn't appear to allow some but not all explanatory variables to vary. I wonder if this is a limitation of the command. We do have one covariate whose effect we think may vary between groups.

Your second question is what happens if there's a minus sign in front of the explained portion. I came across one such situation in that project. In that case, given the differences in observed covariates, we would predict that our minority group should be much better off than we observed - in fact, the disparity should be in favor of the minority group if we believe our model. We do in fact observe a disparity in favor of the majority, but it can't be explained by observables.

A parallel situation is that for one other dependent variable, we explained more than 100% of the disparity through observables. That is, minorities should be even worse off than they are given the observed levels of our independent variables (again, if our model is correct). Our dependent variable is a subjective score (this one is on mood), so one possible explanation is differential item function.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Seyda Coban

Join Date: Dec 2017

Posts: 6
#6

23 Dec 2017, 11:07

Originally posted by Weiwen Ng View Post

Seyda,

Per the FAQ, Statalist policy is not to post Microsoft Word or Excel files; not everyone has Microsoft, and moreover some people have organizational policies that forbid them from opening such files for fear of viruses. It's not you, it's the world we live in.

I'll assume you used Ben Jann's -oaxaca- command, available on SSC. He also has a Stata Journal article explaining the differences between twofold and threefold decomposition. Now, there is some matrix algebra in there, and it was a bit obtuse to me at first. But, here is the explanation I came up with for my own use.

Whatever type of decomposition, you often find part of whatever disparity is involved is due to disparities in the levels of observed variables. (Which you probably already knew.) In twofold decomposition, you are saying that if the observed variables have the same effect in each group (e.g. if the returns to education are identical in women and men), it would explain x% of the observed disparity.

In the threefold decomp, you are also attempting to estimate the effect of a disparity in returns to the observed covariates. In the twofold decomp, this disparity (if it exists) gets folded into the "unexplained" portion of the disparity.

Why choose one or the other? I am not familiar with the technique enough to say. I only used this once for a particular project; my economist colleague recommend twofold decomposition, and I forget why he said he'd prefer twofold in this case. We did have a lot of explanatory variables, and we had no a priori reason to think that most of them had differential effects in both groups. So, it could be that. That said, the oaxaca command doesn't appear to allow some but not all explanatory variables to vary. I wonder if this is a limitation of the command. We do have one covariate whose effect we think may vary between groups.

Your second question is what happens if there's a minus sign in front of the explained portion. I came across one such situation in that project. In that case, given the differences in observed covariates, we would predict that our minority group should be much better off than we observed - in fact, the disparity should be in favor of the minority group if we believe our model. We do in fact observe a disparity in favor of the majority, but it can't be explained by observables.

A parallel situation is that for one other dependent variable, we explained more than 100% of the disparity through observables. That is, minorities should be even worse off than they are given the observed levels of our independent variables (again, if our model is correct). Our dependent variable is a subjective score (this one is on mood), so one possible explanation is differential item function.

Thanks Weiwen! I first tried to attach the image of the output, but the system gave me an error and it didn't accept .jpg extension, interestingly. That's why I thought I can add the image in a Word file. So I didn't think of much on it, sorry.

I will try to apply some additional tests in order to check whether the model is correct or not. Because according to the summary statistics, women earn less wages than men although they have higher levels of education (That's valid for the private sector, by the way). So I think there should be a discrimination against women. I am very surprised of the result though.

My model includes tenure, working hours (monthly), age, education, nationality, region, union membership, the number of workers in workplace and the way of finding the job (e.g. s/he found the job with the help of a labor office, or with the help of family/a friend). The last one is significant in the model, so I can clearly say that there are preferantial treatments for different people here where I live.). I will try to find additional variables now, but is there any variable that comes to your mind now, any suggetion? Or any additional test to check the model?
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

23 Dec 2017, 14:03

Originally posted by Seyda Coban View Post

Thanks Weiwen! I first tried to attach the image of the output, but the system gave me an error and it didn't accept .jpg extension, interestingly. That's why I thought I can add the image in a Word file. So I didn't think of much on it, sorry.

I will try to apply some additional tests in order to check whether the model is correct or not. Because according to the summary statistics, women earn less wages than men although they have higher levels of education (That's valid for the private sector, by the way). So I think there should be a discrimination against women. I am very surprised of the result though.

My model includes tenure, working hours (monthly), age, education, nationality, region, union membership, the number of workers in workplace and the way of finding the job (e.g. s/he found the job with the help of a labor office, or with the help of family/a friend). The last one is significant in the model, so I can clearly say that there are preferantial treatments for different people here where I live.). I will try to find additional variables now, but is there any variable that comes to your mind now, any suggetion? Or any additional test to check the model?

You said that women have higher levels of education than men, yet they earn less than men. In fact, you mentioned that you have a negative explained disparity when you used the twofold decomposition. Remember, in the twofold model, you're averaging the effects of each of the observable variables, and you're assuming that both groups see the average effect. I'm wondering if a threefold decomposition might be warranted.

I was able to obtain a detailed decomposition in a threefold model using a sample dataset. It would help if you posted the command you typed, and you can use the code delimiters to do so. Look for the # button on your control panel (towards the right of the rows of buttons, between the buttons marked " and <> )

Some sample code, which you can cut and paste into Stata:

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta oaxaca lnwage age single educ exper tenure, by(female) oaxaca lnwage age single educ exper tenure, by(female) detail(demographic: age single educ, laborforce: exper tenure)

That said, I've not been able to really understand the "interaction" section of the threefold model. I understand that it's the interaction between the different levels of initial endowments, and the differing returns to those endowments (i.e. the coefficients section). This is something to ask your advisor. I do not regularly use Oaxaca decomposition at all. One initial thought was that you could fit a linear model where you interact female with all the coefficients, e.g.

Code:

regress lnwage i.female##(i.single c.educ c.exper c.tenure)

Do that in the example above and you'll see that some of the interaction coefficients are significant, i.e. we can reject the null that women and men have the same returns to some of the coefficients above. An alternative to that is a Chow test (I believe this is a simple Wald test that the coefficients are different). See link below.

https://www.stata.com/support/faqs/s...cs/chow-tests/

Last edited by Weiwen Ng; 23 Dec 2017, 14:06.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Seyda Coban

Join Date: Dec 2017

Posts: 6
#8

23 Dec 2017, 15:56

Originally posted by Weiwen Ng View Post

You said that women have higher levels of education than men, yet they earn less than men. In fact, you mentioned that you have a negative explained disparity when you used the twofold decomposition. Remember, in the twofold model, you're averaging the effects of each of the observable variables, and you're assuming that both groups see the average effect. I'm wondering if a threefold decomposition might be warranted.

I was able to obtain a detailed decomposition in a threefold model using a sample dataset. It would help if you posted the command you typed, and you can use the code delimiters to do so. Look for the # button on your control panel (towards the right of the rows of buttons, between the buttons marked " and <> )

Some sample code, which you can cut and paste into Stata:

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta oaxaca lnwage age single educ exper tenure, by(female) oaxaca lnwage age single educ exper tenure, by(female) detail(demographic: age single educ, laborforce: exper tenure)

That said, I've not been able to really understand the "interaction" section of the threefold model. I understand that it's the interaction between the different levels of initial endowments, and the differing returns to those endowments (i.e. the coefficients section). This is something to ask your advisor. I do not regularly use Oaxaca decomposition at all. One initial thought was that you could fit a linear model where you interact female with all the coefficients, e.g.

Code:

regress lnwage i.female##(i.single c.educ c.exper c.tenure)

Do that in the example above and you'll see that some of the interaction coefficients are significant, i.e. we can reject the null that women and men have the same returns to some of the coefficients above. An alternative to that is a Chow test (I believe this is a simple Wald test that the coefficients are different). See link below.

https://www.stata.com/support/faqs/s...cs/chow-tests/

Thanks a lot Weiwen! I will try to figure it out with the codes you sent now.
Comment
Dung Le

Join Date: May 2018

Posts: 120
#9

15 Jul 2019, 07:17

Originally posted by Weiwen Ng View Post

Seyda,

Per the FAQ, Statalist policy is not to post Microsoft Word or Excel files; not everyone has Microsoft, and moreover some people have organizational policies that forbid them from opening such files for fear of viruses. It's not you, it's the world we live in.

I'll assume you used Ben Jann's -oaxaca- command, available on SSC. He also has a Stata Journal article explaining the differences between twofold and threefold decomposition. Now, there is some matrix algebra in there, and it was a bit obtuse to me at first. But, here is the explanation I came up with for my own use.

Whatever type of decomposition, you often find part of whatever disparity is involved is due to disparities in the levels of observed variables. (Which you probably already knew.) In twofold decomposition, you are saying that if the observed variables have the same effect in each group (e.g. if the returns to education are identical in women and men), it would explain x% of the observed disparity.

In the threefold decomp, you are also attempting to estimate the effect of a disparity in returns to the observed covariates. In the twofold decomp, this disparity (if it exists) gets folded into the "unexplained" portion of the disparity.

Why choose one or the other? I am not familiar with the technique enough to say. I only used this once for a particular project; my economist colleague recommend twofold decomposition, and I forget why he said he'd prefer twofold in this case. We did have a lot of explanatory variables, and we had no a priori reason to think that most of them had differential effects in both groups. So, it could be that. That said, the oaxaca command doesn't appear to allow some but not all explanatory variables to vary. I wonder if this is a limitation of the command. We do have one covariate whose effect we think may vary between groups.

Your second question is what happens if there's a minus sign in front of the explained portion. I came across one such situation in that project. In that case, given the differences in observed covariates, we would predict that our minority group should be much better off than we observed - in fact, the disparity should be in favor of the minority group if we believe our model. We do in fact observe a disparity in favor of the majority, but it can't be explained by observables.

A parallel situation is that for one other dependent variable, we explained more than 100% of the disparity through observables. That is, minorities should be even worse off than they are given the observed levels of our independent variables (again, if our model is correct). Our dependent variable is a subjective score (this one is on mood), so one possible explanation is differential item function.

Hi Weiwen Ng,

I am replying to your post to make sure that I understand your interpretation correctly regarding the results of the twofold decomposition. Take my case as an example:
- Mean disability score for men: 2.3
- Mean disability score for women: 3.4
- Diff in mean disability score: -0.9
- Due to endowment: -0.5
- Due to coefficient: -0.4

As for the explained component, education (-0.25) and employment status (-0.15) significantly contribute to gender inequality in disability. Is that correct to interpret that if women had the same educational level as men, gender inequality in disability would be reduced by 50%?

Regarding the endowment effect, is that correct to say that if women had the same characteristics as men, gender inequality in disability would have been 0.5 points lower?

Thank you!

Dung Le
1 like
Comment

Announcement

Oaxaca Decomposition Interpretation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment