Hi all,
I am trying to interpret the results of an Oaxaca-Blinder Decomposition. I am using the popular Oaxaca command. For my project I am just interested in the results of a two-way decomposition (i.e. I don't want the interaction terms), so I am using the pooled option. These results are based off of a logistic regression. Here is my output:
I am seeking help since within these results, there is overall significant differences between the two groups (urban and rural in this case), with roughly 24.9% of the difference coming from difference in composition/explained (p-value not statistically significant) and 75.1 percent coming from differences in coefficients/unexplained (p-value is statistically significant). On the surface this make sense, but when you look at the individual variables within the explained and unexplained portions, these findings don't line up.
Specifically, the only significant variables are found in the explained portion (educ4), and there are no significant variables in the unexplained portion (despite unexplained being significant overall).
Am I misinterpreting the results? Or, on a more technical level, how are standard errors and p-values calculated within an Oaxaca-Blinder Decomposition? Does the calculation differ when trying to estimate the significance of the overall explained/unexplained components than when trying to calculate the effects of individual variables?
Please let me know if I can clarify anything
I am trying to interpret the results of an Oaxaca-Blinder Decomposition. I am using the popular Oaxaca command. For my project I am just interested in the results of a two-way decomposition (i.e. I don't want the interaction terms), so I am using the pooled option. These results are based off of a logistic regression. Here is my output:
Code:
. svyset [pweight = weight], str(stratum_var) psu(cluster_var) tab race, gen(race) tab educ, gen(educ) tab hhinc, gen(hhinc) tab age_g5, gen(age_g5 oaxaca cohab_par age_g52 age_g53 age_g54 age_g55 race2 race3 race4 imm /// > educ2 educ3 educ4 hhinc2 hhinc3 hhinc4, /// > by(rural) svy logit pooled Blinder-Oaxaca decomposition Number of strata = 18 Number of obs = 3,268 Number of PSUs = 72 Population size = 38,244,320 Design df = 54 Model = logit Group 1: rural = 0 N of obs 1 = 2116 Group 2: rural = 1 N of obs 2 = 468 Linearized cohab_par Coefficient std. err. t P>t [95% conf. interval] overall group_1 .1405385 .0141326 9.94 0.000 .1122043 .1688727 group_2 .2199063 .0242979 9.05 0.000 .1711919 .2686207 difference -.0793678 .0276199 -2.87 0.006 -.1347424 -.0239933 explained -.0197445 .0107814 -1.83 0.073 -.0413599 .0018708 unexplained -.0596233 .0273862 -2.18 0.034 -.1145294 -.0047172 explained age_g52 -.0005991 .0012 -0.50 0.620 -.0030048 .0018067 age_g53 -.001 .001295 -0.77 0.443 -.0035962 .0015963 age_g54 .0000969 .0009533 0.10 0.919 -.0018143 .0020081 age_g55 -.0004468 .0010476 -0.43 0.671 -.0025472 .0016535 race2 -.001195 .0013982 -0.85 0.396 -.0039982 .0016081 race3 .0009479 .00266 0.36 0.723 -.0043851 .0062809 race4 .0005973 .0013599 0.44 0.662 -.002129 .0033237 imm .0003432 .0019318 0.18 0.860 -.0035298 .0042161 educ2 -.0002301 .0009301 -0.25 0.806 -.0020949 .0016347 educ3 .0008762 .0017989 0.49 0.628 -.0027304 .0044828 educ4 -.012033 .005119 -2.35 0.022 -.022296 -.00177 hhinc2 -.0003015 .0006636 -0.45 0.651 -.0016318 .0010289 hhinc3 -.0001882 .0006291 -0.30 0.766 -.0014493 .001073 hhinc4 -.0066125 .0039993 -1.65 0.104 -.0146306 .0014057 unexplained age_g52 -.028689 .0165345 -1.74 0.088 -.0618386 .0044607 age_g53 -.0297615 .0290941 -1.02 0.311 -.0880916 .0285686 age_g54 -.0556364 .0411148 -1.35 0.182 -.1380666 .0267938 age_g55 -.0026016 .0261534 -0.10 0.921 -.055036 .0498328 race2 .0048568 .0077741 0.62 0.535 -.0107293 .0204428 race3 .0176802 .0161501 1.09 0.278 -.0146989 .0500593 race4 .0041202 .0058635 0.70 0.485 -.0076355 .0158759 imm -.0093887 .012094 -0.78 0.441 -.0336357 .0148583 educ2 .0047812 .0141557 0.34 0.737 -.0235994 .0331617 educ3 -.0291882 .026426 -1.10 0.274 -.0821691 .0237926 educ4 -.0366389 .0235373 -1.56 0.125 -.0838284 .0105505 hhinc2 .0017072 .0118416 0.14 0.886 -.0220339 .0254482 hhinc3 .0011674 .0093851 0.12 0.901 -.0176486 .0199834 hhinc4 -.0020982 .0118124 -0.18 0.860 -.0257806 .0215842 _cons .1000664 .098881 1.01 0.316 -.0981782 .2983109
Specifically, the only significant variables are found in the explained portion (educ4), and there are no significant variables in the unexplained portion (despite unexplained being significant overall).
Am I misinterpreting the results? Or, on a more technical level, how are standard errors and p-values calculated within an Oaxaca-Blinder Decomposition? Does the calculation differ when trying to estimate the significance of the overall explained/unexplained components than when trying to calculate the effects of individual variables?
Please let me know if I can clarify anything
Comment