Simple question on OLS & Oaxaca decomposition

Maria Mendez

Join Date: Sep 2018

Posts: 4
#1

Simple question on OLS & Oaxaca decomposition

05 Mar 2019, 12:41

Dear StataList,

I would like to ask you a doubt if you would be so kind to respond. I calculated an OLS to estimate change in hours worked between year 2010 and year 2018, controlling for other factors. My coefficient is 8 minutes. However, when I calculate Oaxaca to decompose which part of these 8 minutes is explained and which one is unexplained, Oaxaca gives me a coefficient of total change of 12 minutes.

Could that be possible? Or Oaxaca should also give me the same 8 minutes, rather than a different number?

I would appreciate so much any hint, thank you so much.
Tags: None
FernandoRios

Join Date: Apr 2014

Posts: 2492
#2

05 Mar 2019, 21:01

Hi Maria
Could you provide more details of the problem? in particular, can you post the results that you obtain after the oaxaca and your OLS calculation?
In any case, the result from oaxaca and from ols will not be necessarily the same because they are two different models.
The only case i can think of giving you the same value everytime is if the OLS gap is obtained when you have no controls.
HTH
Fernando
Comment

Maria Mendez

Join Date: Sep 2018
Posts: 4

06 Mar 2019, 10:54

Dear Fernando,

Thank you so much for your response. So I should not worry about different results in the oaxaca and the OLS models.

I would appreciate so much Fernando (or someone else) if you would be so kind to respond to this question. Attached you can find the one of the outcomes of my Oaxaca analysis for changes in housework time (in minutes) of women. The "explained" part is clear to interpret: changes in the composition. For example, women would dedicate -3 minutes in 2018 if they would have the characteristics of those in 2010, right? But how do you interpret the "unexplained" part in control variables? For example, the age of the child appears significant in the unexplained part. How could you interpret that?

Many many thanks.
All the best

Code:

Blinder-Oaxaca decomposition                    Number of obs     =      5,566
                                                  Model           =     linear
Group 1: year18 = 0                               N of obs 1      =       1762
Group 2: year18 = 1                               N of obs 2      =       3804

-------------------------------------------------------------------------------
              |               Robust
housework |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
overall       |
      group_1 |   85.91373   2.354908    36.48   0.000      81.2982    90.52927
      group_2 |   82.14774   1.535938    53.48   0.000     79.13736    85.15812
   difference |   3.765995   2.811529     1.34   0.180    -1.744501    9.276491
    explained |  -2.405477   1.835476    -1.31   0.190    -6.002943    1.191989
  unexplained |   6.171472   2.556521     2.41   0.016     1.160783    11.18216
--------------+----------------------------------------------------------------
explained     |
   unempinact |  -3.010797   .4898713    -6.15   0.000    -3.970927   -2.050667
     parttime |  -.0748738   .4519227    -0.17   0.868     -.960626    .8108783
   parttime_p |  -.0739686   .1913108    -0.39   0.699    -.4489309    .3009938
 unempinact_p |  -1.041083   .3122599    -3.33   0.001    -1.653102   -.4290653
      highedu |   2.104105   .4339075     4.85   0.000     1.253662    2.954549
    highedu_p |   .0964902   .2182272     0.44   0.658    -.3312274    .5242077
          age |   1.001825   2.488688     0.40   0.687    -3.875913    5.879563
         age2 |    -3.3202   2.479041    -1.34   0.180    -8.179031    1.538632
       child2 |    -.15025   .1036912    -1.45   0.147     -.353481    .0529811
   child3plus |  -.0217902   .0493033    -0.44   0.659    -.1184228    .0748425
       agekid |    3.87139   1.229144     3.15   0.002     1.462312    6.280467
adhousemember |   .3398256   .1494529     2.27   0.023     .0469032    .6327479
      weekday |  -.5447791   .1918632    -2.84   0.005    -.9208241   -.1687342
     outsourc |  -.1794601   .4767135    -0.38   0.707    -1.113801    .7548811
     daytype |   -1.38465   .3071422    -4.51   0.000    -1.986638   -.7826627
        cohab |  -.0172607   .2427991    -0.07   0.943    -.4931382    .4586169
--------------+----------------------------------------------------------------
unexplained   |
   unempinact |   3.394735   2.368342     1.43   0.152     -1.24713    8.036599
     parttime |   .8837172   .8639529     1.02   0.306    -.8095994    2.577034
   parttime_p |   .0331187    .253998     0.13   0.896    -.4647082    .5309455
 unempinact_p |  -.0145164   .7870646    -0.02   0.985    -1.557135    1.528102
      highedu |   2.537706   2.234893     1.14   0.256    -1.842603    6.918015
    highedu_p |  -.4941746   1.915923    -0.26   0.796    -4.249315    3.260966
          age |  -61.73677   179.0045    -0.34   0.730    -412.5792    289.1057
         age2 |   29.23735   87.07799     0.34   0.737    -141.4324    199.9071
       child2 |    .542522   2.676713     0.20   0.839     -4.70374    5.788784
   child3plus |   .5723079    .905593     0.63   0.527    -1.202622    2.347238
       agekid |    17.6591    7.92546     2.23   0.026     2.125489    33.19272
adhousemember |   .2793805    1.00629     0.28   0.781    -1.692911    2.251672
      weekday |  -3.111801   3.134674    -0.99   0.321    -9.255648    3.032047
     outsourc |   .9186022   1.246632     0.74   0.461    -1.524751    3.361956
      daytype |   -4.63638   4.378944    -1.06   0.290    -13.21895    3.946193
        cohab |  -.4608457   .7161896    -0.64   0.520    -1.864552    .9428601
        _cons |   20.56741   90.10533     0.23   0.819    -156.0358    197.1706
-------------------------------------------------------------------------------

Comment

FernandoRios

Join Date: Apr 2014

Posts: 2492
#4

06 Mar 2019, 11:52

The results are indeed puzzling. I find it odd that the explained component is in aggregate not statistically significant, but each individual component IS significant. Whereas the opposite is true for the unexplained component. Anyways, that is something rare but not impossible.
For your specific comment, the exact interpretation may depend on the syntax you are using (is it w(0),w(1), Omega or pool).
I would suggest to reconstruct this results by hand, so you can more clearly see what the coefficients are in each individual regression, what the endowments are, and how they relate to the differences you observed in your output.
For instance, the decomposition is telling you that between 2010 and 2018, the number of hours decreased in 3 mins, which is more than explained by the differences in the coefficients. The changes in the endowments, however counteracted this change. Because of the composition effect, women would be working 2.4 mins more in 2018 compared to 2010.
HTH
Fernando
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#5

06 Mar 2019, 11:53

Originally posted by Maria Mendez View Post

Dear Fernando,

Thank you so much for your response. So I should not worry about different results in the oaxaca and the OLS models.

I would appreciate so much Fernando (or someone else) if you would be so kind to respond to this question. Attached you can find the one of the outcomes of my Oaxaca analysis for changes in housework time (in minutes) of women. The "explained" part is clear to interpret: changes in the composition. For example, women would dedicate -3 minutes in 2018 if they would have the characteristics of those in 2010, right? But how do you interpret the "unexplained" part in control variables? For example, the age of the child appears significant in the unexplained part. How could you interpret that?

Many many thanks.
All the best

Code:

------------------------------------------------------------------------------- | Robust housework | Coef. Std. Err. z P>|z| [95% Conf. Interval] --------------+---------------------------------------------------------------- overall | group_1 | 85.91373 2.354908 36.48 0.000 81.2982 90.52927 group_2 | 82.14774 1.535938 53.48 0.000 79.13736 85.15812 difference | 3.765995 2.811529 1.34 0.180 -1.744501 9.276491 explained | -2.405477 1.835476 -1.31 0.190 -6.002943 1.191989 unexplained | 6.171472 2.556521 2.41 0.016 1.160783 11.18216 ...

You've got a bit of an unusual situation: your explained difference is negative. I believe this means that given the observed characteristics, group 2's mean of housework should be 2.40 points higher than group 1's if we equalized their observed characteristics, not lower. Unexplained + explained = observed disparity; I think that the explained disparity being negative may be the cause of some of your confusion. Look at the example in the oaxaca manual to see what decomposition looks like more usually.

I don't know of a simple explanation for what the individual coefficients under the unexplained section mean.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Maria Mendez

Join Date: Sep 2018

Posts: 4
#6

06 Mar 2019, 12:43

Dear Fernando and Weiwen,

Thank you so much for your responses. If I understand right, even though my results are not typical, they are plausible. Is there any reference, or someone would be so kind to let me know, why covariates are significant but the overall results is not (in the "explained" part)?

The syntax is: oaxaca y x1 x2..., by(year) pooled

Massive thanks
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

06 Mar 2019, 17:17

Originally posted by Maria Mendez View Post

Dear Fernando and Weiwen,

Thank you so much for your responses. If I understand right, even though my results are not typical, they are plausible. Is there any reference, or someone would be so kind to let me know, why covariates are significant but the overall results is not (in the "explained" part)?

The syntax is: oaxaca y x1 x2..., by(year) pooled

Massive thanks

Well, the Z-tests all test if the associated quantity is equal to zero. So, obviously, the p-values for the means of group 1 and group 2 are statistically different from zero.

For the total explained disparity, the point estimate is -2.40 units, and the p-value (for H0 that explained = 0) is 0.190.

If you go through the individual covariates under the explained section, some of them explain some of the disparity, e.g. the dummy for higheredu explains 2.10 points of the disparity, i.e. if you equalized the mean level of higheredu, you would expect the disparity to decline by 2.10 points, and a few more have positive coefficients. A bunch of them are pushing the other way, e.g. if you equalized the mean level of unempinact, you'd expect the disparity to increase by 3.01 points.

I don't think there is an easy verbal explanation behind why many individual covariates have significant betas, but the sum of their explained effects is not significant. It is what it is. I also don't think it's a substantive concern, personally.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#8

27 Mar 2019, 06:07

Just as an afterthought, considering the DV is a count variable, maybe you should think about modeling with the user-written - nldecompose - program. Also, maybe this will solve the paradox concerning the results.

Best regards,

Marcos
Comment

Announcement