Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simple question on OLS & Oaxaca decomposition

    Dear StataList,

    I would like to ask you a doubt if you would be so kind to respond. I calculated an OLS to estimate change in hours worked between year 2010 and year 2018, controlling for other factors. My coefficient is 8 minutes. However, when I calculate Oaxaca to decompose which part of these 8 minutes is explained and which one is unexplained, Oaxaca gives me a coefficient of total change of 12 minutes.

    Could that be possible? Or Oaxaca should also give me the same 8 minutes, rather than a different number?

    I would appreciate so much any hint, thank you so much.

  • #2
    Hi Maria
    Could you provide more details of the problem? in particular, can you post the results that you obtain after the oaxaca and your OLS calculation?
    In any case, the result from oaxaca and from ols will not be necessarily the same because they are two different models.
    The only case i can think of giving you the same value everytime is if the OLS gap is obtained when you have no controls.
    HTH
    Fernando

    Comment


    • #3
      Dear Fernando,

      Thank you so much for your response. So I should not worry about different results in the oaxaca and the OLS models.

      I would appreciate so much Fernando (or someone else) if you would be so kind to respond to this question. Attached you can find the one of the outcomes of my Oaxaca analysis for changes in housework time (in minutes) of women. The "explained" part is clear to interpret: changes in the composition. For example, women would dedicate -3 minutes in 2018 if they would have the characteristics of those in 2010, right? But how do you interpret the "unexplained" part in control variables? For example, the age of the child appears significant in the unexplained part. How could you interpret that?

      Many many thanks.
      All the best



      Code:
      Blinder-Oaxaca decomposition                    Number of obs     =      5,566
                                                        Model           =     linear
      Group 1: year18 = 0                               N of obs 1      =       1762
      Group 2: year18 = 1                               N of obs 2      =       3804
      
      -------------------------------------------------------------------------------
                    |               Robust
      housework |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
      --------------+----------------------------------------------------------------
      overall       |
            group_1 |   85.91373   2.354908    36.48   0.000      81.2982    90.52927
            group_2 |   82.14774   1.535938    53.48   0.000     79.13736    85.15812
         difference |   3.765995   2.811529     1.34   0.180    -1.744501    9.276491
          explained |  -2.405477   1.835476    -1.31   0.190    -6.002943    1.191989
        unexplained |   6.171472   2.556521     2.41   0.016     1.160783    11.18216
      --------------+----------------------------------------------------------------
      explained     |
         unempinact |  -3.010797   .4898713    -6.15   0.000    -3.970927   -2.050667
           parttime |  -.0748738   .4519227    -0.17   0.868     -.960626    .8108783
         parttime_p |  -.0739686   .1913108    -0.39   0.699    -.4489309    .3009938
       unempinact_p |  -1.041083   .3122599    -3.33   0.001    -1.653102   -.4290653
            highedu |   2.104105   .4339075     4.85   0.000     1.253662    2.954549
          highedu_p |   .0964902   .2182272     0.44   0.658    -.3312274    .5242077
                age |   1.001825   2.488688     0.40   0.687    -3.875913    5.879563
               age2 |    -3.3202   2.479041    -1.34   0.180    -8.179031    1.538632
             child2 |    -.15025   .1036912    -1.45   0.147     -.353481    .0529811
         child3plus |  -.0217902   .0493033    -0.44   0.659    -.1184228    .0748425
             agekid |    3.87139   1.229144     3.15   0.002     1.462312    6.280467
      adhousemember |   .3398256   .1494529     2.27   0.023     .0469032    .6327479
            weekday |  -.5447791   .1918632    -2.84   0.005    -.9208241   -.1687342
           outsourc |  -.1794601   .4767135    -0.38   0.707    -1.113801    .7548811
           daytype |   -1.38465   .3071422    -4.51   0.000    -1.986638   -.7826627
              cohab |  -.0172607   .2427991    -0.07   0.943    -.4931382    .4586169
      --------------+----------------------------------------------------------------
      unexplained   |
         unempinact |   3.394735   2.368342     1.43   0.152     -1.24713    8.036599
           parttime |   .8837172   .8639529     1.02   0.306    -.8095994    2.577034
         parttime_p |   .0331187    .253998     0.13   0.896    -.4647082    .5309455
       unempinact_p |  -.0145164   .7870646    -0.02   0.985    -1.557135    1.528102
            highedu |   2.537706   2.234893     1.14   0.256    -1.842603    6.918015
          highedu_p |  -.4941746   1.915923    -0.26   0.796    -4.249315    3.260966
                age |  -61.73677   179.0045    -0.34   0.730    -412.5792    289.1057
               age2 |   29.23735   87.07799     0.34   0.737    -141.4324    199.9071
             child2 |    .542522   2.676713     0.20   0.839     -4.70374    5.788784
         child3plus |   .5723079    .905593     0.63   0.527    -1.202622    2.347238
             agekid |    17.6591    7.92546     2.23   0.026     2.125489    33.19272
      adhousemember |   .2793805    1.00629     0.28   0.781    -1.692911    2.251672
            weekday |  -3.111801   3.134674    -0.99   0.321    -9.255648    3.032047
           outsourc |   .9186022   1.246632     0.74   0.461    -1.524751    3.361956
            daytype |   -4.63638   4.378944    -1.06   0.290    -13.21895    3.946193
              cohab |  -.4608457   .7161896    -0.64   0.520    -1.864552    .9428601
              _cons |   20.56741   90.10533     0.23   0.819    -156.0358    197.1706
      -------------------------------------------------------------------------------

      Comment


      • #4
        The results are indeed puzzling. I find it odd that the explained component is in aggregate not statistically significant, but each individual component IS significant. Whereas the opposite is true for the unexplained component. Anyways, that is something rare but not impossible.
        For your specific comment, the exact interpretation may depend on the syntax you are using (is it w(0),w(1), Omega or pool).
        I would suggest to reconstruct this results by hand, so you can more clearly see what the coefficients are in each individual regression, what the endowments are, and how they relate to the differences you observed in your output.
        For instance, the decomposition is telling you that between 2010 and 2018, the number of hours decreased in 3 mins, which is more than explained by the differences in the coefficients. The changes in the endowments, however counteracted this change. Because of the composition effect, women would be working 2.4 mins more in 2018 compared to 2010.
        HTH
        Fernando

        Comment


        • #5
          Originally posted by Maria Mendez View Post
          Dear Fernando,

          Thank you so much for your response. So I should not worry about different results in the oaxaca and the OLS models.

          I would appreciate so much Fernando (or someone else) if you would be so kind to respond to this question. Attached you can find the one of the outcomes of my Oaxaca analysis for changes in housework time (in minutes) of women. The "explained" part is clear to interpret: changes in the composition. For example, women would dedicate -3 minutes in 2018 if they would have the characteristics of those in 2010, right? But how do you interpret the "unexplained" part in control variables? For example, the age of the child appears significant in the unexplained part. How could you interpret that?

          Many many thanks.
          All the best



          Code:
          -------------------------------------------------------------------------------
          | Robust
          housework | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          --------------+----------------------------------------------------------------
          overall |
          group_1 | 85.91373 2.354908 36.48 0.000 81.2982 90.52927
          group_2 | 82.14774 1.535938 53.48 0.000 79.13736 85.15812
          difference | 3.765995 2.811529 1.34 0.180 -1.744501 9.276491
          explained | -2.405477 1.835476 -1.31 0.190 -6.002943 1.191989
          unexplained | 6.171472 2.556521 2.41 0.016 1.160783 11.18216
          ...
          You've got a bit of an unusual situation: your explained difference is negative. I believe this means that given the observed characteristics, group 2's mean of housework should be 2.40 points higher than group 1's if we equalized their observed characteristics, not lower. Unexplained + explained = observed disparity; I think that the explained disparity being negative may be the cause of some of your confusion. Look at the example in the oaxaca manual to see what decomposition looks like more usually.

          I don't know of a simple explanation for what the individual coefficients under the unexplained section mean.
          Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

          When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

          Comment


          • #6
            Dear Fernando and Weiwen,

            Thank you so much for your responses. If I understand right, even though my results are not typical, they are plausible. Is there any reference, or someone would be so kind to let me know, why covariates are significant but the overall results is not (in the "explained" part)?

            The syntax is: oaxaca y x1 x2..., by(year) pooled

            Massive thanks

            Comment


            • #7
              Originally posted by Maria Mendez View Post
              Dear Fernando and Weiwen,

              Thank you so much for your responses. If I understand right, even though my results are not typical, they are plausible. Is there any reference, or someone would be so kind to let me know, why covariates are significant but the overall results is not (in the "explained" part)?

              The syntax is: oaxaca y x1 x2..., by(year) pooled

              Massive thanks
              Well, the Z-tests all test if the associated quantity is equal to zero. So, obviously, the p-values for the means of group 1 and group 2 are statistically different from zero.

              For the total explained disparity, the point estimate is -2.40 units, and the p-value (for H0 that explained = 0) is 0.190.

              If you go through the individual covariates under the explained section, some of them explain some of the disparity, e.g. the dummy for higheredu explains 2.10 points of the disparity, i.e. if you equalized the mean level of higheredu, you would expect the disparity to decline by 2.10 points, and a few more have positive coefficients. A bunch of them are pushing the other way, e.g. if you equalized the mean level of unempinact, you'd expect the disparity to increase by 3.01 points.

              I don't think there is an easy verbal explanation behind why many individual covariates have significant betas, but the sum of their explained effects is not significant. It is what it is. I also don't think it's a substantive concern, personally.
              Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

              When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

              Comment


              • #8
                Just as an afterthought, considering the DV is a count variable, maybe you should think about modeling with the user-written - nldecompose - program. Also, maybe this will solve the paradox concerning the results.
                Best regards,

                Marcos

                Comment

                Working...
                X