Heckman selection model with Blinder-Oaxaca Decomposition

Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#16

07 Apr 2020, 17:33

You could see if bootstrapping the standard errors via

Code:

bootstrap :oaxaca logGRSSWK $wageeq if inlist(DISTYPE,1,4) & quarter == 1, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) weight(0) noisily relax

gives you different results.
You also run a simple t-test to see if there are any differences between the groups

Code:

ttest logGRSSWK if inlist(DISTYPE,1,4) & quarter==1,by(DISTYPE)

Maybe there are no wage? differences to begin with.
Comment

Will Murphy

Join Date: Feb 2020
Posts: 52

#17

08 Apr 2020, 03:50

Originally posted by Sven-Kristjan Bormann View Post

You could see if bootstrapping the standard errors via

Code:

bootstrap :oaxaca logGRSSWK $wageeq if inlist(DISTYPE,1,4) & quarter == 1, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) weight(0) noisily relax

gives you different results.
You also run a simple t-test to see if there are any differences between the groups

Code:

ttest logGRSSWK if inlist(DISTYPE,1,4) & quarter==1,by(DISTYPE)

Maybe there are no wage? differences to begin with.

Thank you very much for the suggestions.
Running the latter t-test suggested the wage difference was significant at the 5% level:

HTML Code:

 diff = mean(WLD) - mean(Non-disa)                             t =  -1.9896
Ho: diff = 0                                     degrees of freedom =      564

    Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
 Pr(T < t) = 0.0236         Pr(|T| > |t|) = 0.0471          Pr(T > t) = 0.9764

Thus I believe suggesting this wage decomposition is somewhat necessitated.

However bootstrapping the standard errors led to:

HTML Code:

Bootstrap replications (50)
----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    50
insufficient observations to compute bootstrap standard errors
no results will be saved
r(2000);

Perhaps this is due to there only being 58 observations for WLD in quarter 1 and thus the insignificant p-values could be perhaps attributed to a small sample?

Comment

Will Murphy

Join Date: Feb 2020
Posts: 52

#18

08 Apr 2020, 04:59

Alternatively, could it be of the way I am treating categorical variables? i.e :

Code:

 global wageeq "normalize(b.dWHITE1 dWHITE2) normalize(b.dAGE1 dAGE2 dAGE3 dAGE4 dAGE5)
normalize(dRESIDENCE1 dRESIDENCE2 dRESIDENCE3 dRESIDENCE4 dRESIDENCE5 b.dRESIDENCE6)..."

global seleq "dWHITE1 dWHITE2 dAGE1 dAGE2 dAGE3 dAGE4 dAGE5 dRESIDENCE1 dRESIDENCE2 dRESIDENCE4 dRESIDENCE5 dRESIDENCE6..."

Sorry I am just trying to get my head round problem after problem I am facing as I am unable to change datasets.

Comment

Will Murphy

Join Date: Feb 2020

Posts: 52
#19

08 Apr 2020, 05:27

(this post #19 was cleared, via editing, as of an incorrect coding that, when corrected, made this post redundant). My queries in #17 and #18 still stand and any advice/clarity would be greatly appreciated.

Last edited by Will Murphy; 08 Apr 2020, 05:39.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#20

08 Apr 2020, 10:06

Perhaps this is due to there only being 58 observations for WLD in quarter 1 and thus the insignificant p-values could be perhaps attributed to a small sample?

This could be indeed the case.

Alternatively, could it be of the way I am treating categorical variables?

Without having seen your dataset, this is difficult to answer. However, the difference which is reported in your post #15 is the raw difference between the groups based on individuals without missing values for each variable.
Maybe, you can rerun the t-test with something like this:

Code:

ttest logGRSSWK if inlist(DISTYPE,1,4) & quarter==1 & !mi(WHITE,AGE,RESIDENCE),by(DISTYPE)

and see if the difference is still significant or add more of your independent variables to !mi(...)

A side note: You could convert your variable names to lower to make your and my life easier when typing the commands.

Code:

rename *,lower
Comment
Will Murphy

Join Date: Feb 2020

Posts: 52
#21

08 Apr 2020, 13:47

Originally posted by Sven-Kristjan Bormann View Post

Maybe, you can rerun the t-test with something like this:

Code:

ttest logGRSSWK if inlist(DISTYPE,1,4) & quarter==1 & !mi(WHITE,AGE,RESIDENCE),by(DISTYPE)

and see if the difference is still significant or add more of your independent variables to !mi(...)

A side note: You could convert your variable names to lower to make your and my life easier when typing the commands.

Code:

rename *,lower

Thanks for the advice-all posts on this thread have been when restricting my sample to only males (i.e dropping female observations). Therefore, accounting for both genders (i.e adding a gender dummy variable to my wage and select equations) has strengthened the p-value significance to an extent and provides some form of a solution.

Running the above t-tests, with and without the inclusion of a gender control variable, the addition of variables yielded no change at all to the significance of the difference, further pointing me towards the issue of small sample bias.

Thank you for the invaluable help, it has been greatly appreciated. All the best.

Last edited by Will Murphy; 08 Apr 2020, 14:44.
Comment
Will Murphy

Join Date: Feb 2020

Posts: 52
#22

09 Apr 2020, 00:33

Originally posted by Sven-Kristjan Bormann View Post

This could be indeed the case.

Without having seen your dataset, this is difficult to answer. However, the difference which is reported in your post #15 is the raw difference between the groups based on individuals without missing values for each variable.
Maybe, you can rerun the t-test with something like this:

Code:

ttest logGRSSWK if inlist(DISTYPE,1,4) & quarter==1 & !mi(WHITE,AGE,RESIDENCE),by(DISTYPE)

and see if the difference is still significant or add more of your independent variables to !mi(...)

A side note: You could convert your variable names to lower to make your and my life easier when typing the commands.

Code:

rename *,lower

Apologies, if I may ask one more question.

A lot of relating literature either estimates for different genders or only assesses 1 gender, in order to avoid including the effects of gender in their measures. Whilst I have done that previously in only assessing males, as I mentioned adding females in my wage and selection equations increases my p-value significance considerably:
i.e

Code:

global wageeq "dAGE1 dAGE2 dAGE3...dFEMALE1 dFEMALE2

.

By assessing how much gender (i.e dFEMALE1 dFEMALE2) contributes to the explained gap (which appears to be relatively insignificant (in coefficient and p-value)) does this not account for the effects of gender? If this doesn't, are there any alternatives, e.g introducing interaction terms?
Because I wouldn't want to only account for one/each gender and obtain very insignificant p-values again. Many thanks.

Last edited by Will Murphy; 09 Apr 2020, 00:49.
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#23

09 Apr 2020, 11:57

Why are you concerned with the lack of significance? Your sample is rather small, so I would expect to see insignificant values, especially if the standard deviations of key variables are large.
I would calculate the decomposition separately by gender to avoid dealing directly with the gender wage gap.

By assessing how much gender (i.e. dFEMALE1 dFEMALE2) contributes to the explained gap (which appears to be relatively insignificant (in coefficient and p-value)) does this not account for the effects of gender? If this doesn't, are there any alternatives, e.g. introducing interaction terms?

I would need to see your results to say anything meaningful. Of course, you can try out different specifications and see what happens as part of a robustness check or something similar.

Because I wouldn't want to only account for one/each gender and obtain very insignificant p-values again.

If you obtain large p-values and your main specification is (theoretically) reasonable, then I would report these results.
Comment
Will Murphy

Join Date: Feb 2020

Posts: 52
#24

09 Apr 2020, 12:51

Originally posted by Sven-Kristjan Bormann View Post

If you obtain large p-values and your main specification is (theoretically) reasonable, then I would report these results.

Thank you very much- sorry for my incompetence but just to quickly check: by my main specification being reasonable, do you mean whether oaxaca is suitable for doing what I wish to (i.e decompose the wage gap)?
Comment
Sven-Kristjan Bormann

Join Date: Jul 2018

Posts: 310
#25

09 Apr 2020, 13:12

By reasonable I mean that you included all variable from your dataset which seem to be sensible to include.
But it could be also that the Blinder-Oaxaca decomposition in itself is not suitable for your research problem. The Blinder-Oaxaca decomposition assumes like a normal OLS regression, a linear in parameters connection between the variables and calculates the effects and the means of the variables.
Maybe there are differences in other parts of the distribution but not at the mean.
I don't want to scare you, just pointing out what else to consider. Usually the BO-decomposition is reasonable and often used for wage decomposition.
Comment
Will Murphy

Join Date: Feb 2020

Posts: 52
#26

10 Apr 2020, 01:51

Originally posted by Sven-Kristjan Bormann View Post

By reasonable I mean that you included all variable from your dataset which seem to be sensible to include.
But it could be also that the Blinder-Oaxaca decomposition in itself is not suitable for your research problem. The Blinder-Oaxaca decomposition assumes like a normal OLS regression, a linear in parameters connection between the variables and calculates the effects and the means of the variables.
Maybe there are differences in other parts of the distribution but not at the mean.
I don't want to scare you, just pointing out what else to consider. Usually the BO-decomposition is reasonable and often used for wage decomposition.

No that makes perfect sense thank you- I would use the extension of the B-O decomposition, i.e the RIF decomposition, however with a relatively small sample this is likely to lead to further problems. Thank you very much for all your help.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment