Difficulty graphing explained/unexplained results of the “oaxaca” package

Monica Aswani

Join Date: Aug 2015

Posts: 8
#1

Difficulty graphing explained/unexplained results of the “oaxaca” package

24 Jul 2018, 11:11

Does anyone have advice on how to produce a graph comparable to this one in Stata after running the oaxaca package? I have explored gdecomp, which does have a graph option; however, it produces missing standard errors for a handful of variables (it also does not seem to have a “pooled” option comparable to oaxaca so I get different results). Another package, mmsel, is nice for an overall counterfactual distribution picture, but it’s not exactly what I am looking for either. Does anyone know of a package that does it, or have suggestions on the best way to graph the output from the oaxaca command itself? Thanks in advance.

Last edited by Monica Aswani; 24 Jul 2018, 11:15.
Tags: None

1 like
Ariel Karlinsky

Join Date: Jun 2015

Posts: 491
#2

20 Nov 2018, 06:32

Hi Monica, did you get any help on this? I'm interested as well
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

20 Nov 2018, 11:04

I would also be interested in getting others' feedback. Right now, I can think of two ways to do this:

1) Export the results to Excel, then work from there.

2) Use -coefplot- (available on SSC, written by Ben Jann, who also wrote the -oaxaca- command). Explained and unexplained are output as equation names. All the covariates are output with their usual names. I've tried this, and I've found the syntax to be tricky. The output was not that attractive.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
David Benson

Join Date: Oct 2018

Posts: 489
#4

20 Nov 2018, 12:37

I would do it in Excel (but that's only because I haven't done graphics a lot in Stata). For a couple of examples of how to do it in Excel see:
Positive Negative Bar Chart

Floating Bars in Excel Charts (scroll down to the part on horizontal bars)

They show how to do the positive / negative for a bar chart. You could do the same thing for a box-whisker plot.
Comment

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

10 Feb 2019, 12:00

Resurrecting this old topic. I may be able to present a solution.

Code:

use http://fmwww.bc.edu/RePEc/bocode/o/oaxaca.dta
oaxaca lnwage educ exper tenure, by(female) pooled
Blinder-Oaxaca decomposition                    Number of obs     =      1,434
                                                  Model           =     linear
Group 1: female = 0                               N of obs 1      =        751
Group 2: female = 1                               N of obs 2      =        683

------------------------------------------------------------------------------
             |               Robust
      lnwage |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
overall      |
     group_1 |   3.440222   .0174586   197.05   0.000     3.406004     3.47444
     group_2 |   3.266761   .0218042   149.82   0.000     3.224026    3.309497
  difference |   .1734607   .0279325     6.21   0.000      .118714    .2282075
   explained |    .089347   .0137531     6.50   0.000     .0623915    .1163026
 unexplained |   .0841137    .025333     3.32   0.001      .034462    .1337654
-------------+----------------------------------------------------------------
explained    |
        educ |   .0493404   .0113168     4.36   0.000     .0271599     .071521
       exper |   .0215214   .0064081     3.36   0.001     .0089617    .0340811
      tenure |   .0184852   .0051833     3.57   0.000     .0083262    .0286443
-------------+----------------------------------------------------------------
unexplained  |
        educ |  -.0656254    .139432    -0.47   0.638    -.3389072    .2076564
       exper |  -.0421741   .0411638    -1.02   0.306    -.1228537    .0385055
      tenure |   .0476693   .0271699     1.75   0.079    -.0055828    .1009213
       _cons |   .1442439   .1624352     0.89   0.375    -.1741233    .4626112
------------------------------------------------------------------------------
 mat list e(b)

e(b)[1,12]
         overall:      overall:      overall:      overall:      overall:    explained:
         group_1       group_2    difference     explained   unexplained          educ
y1      3.440222     3.2667612     .17346074     .08934705      .0841137     .04934044

       explained:    explained:  unexplained:  unexplained:  unexplained:  unexplained:
           exper        tenure          educ         exper        tenure         _cons
y1     .02152136     .01848524    -.06562539    -.04217411     .04766925     .14424394

The graphic in the original post looks like it's plotting explained and unexplained differences from a twofold decomposition (here, the model coefficients aren't allowed to differ between the reference and focal groups). However, this logicl should extend to the threefold decomposition.

The solution uses coefplot (Ben Jann, available on SSC, same author as oaxaca). You have to install coefplot. Stata's coefficient names are stored in the format of equation_name:coefficient_name. Most of us are familiar with single-equation models, and may not have heard of multiple equation models. The oaxaca command is treated as a multiple equation model under the hood. explained: refers to the explained part of the disparity. To plot:

Code:

coefplot (., keep(explained:*)),  bylabel("Explained") || ///
(., keep(unexplained:*)),  bylabel("Unexplained") || , ///
drop(*:_cons) recast(bar) barwidth(0.5) citop ciopts(recast(rcap) color(black))

The first line tells coefplot to create one subgraph, keeping only the coefficients related to the explained part of the model. The second line does the same for the unexplained part of the disparity. The third line invokes a bunch of options, mainly recasting the graph into bars (default is dots and whiskers), changing the bar width, putting the CIs on top of the bars, changing the CIs to capped lines, and coloring the CIs black (otherwise you can't see them). You will get the attached graph. There's undoubtedly room for improving this if you're familiar with coefplot. In particular, I'd like to see some sort of vertical grid lines in the color scheme. I wasn't able to get the colors to alternate between explained and unexplained, or among the individual coefficients in each subgraph. And this will probably be very unwieldy if you have a lot of coefficients you want to graph. If you do, though, do take note of the option to group coefficients in the oaxaca command itself. You can then use keep statements to restrict things to your desired groupings.

Click image for larger version

Name: Graph.png
Views: 1
Size: 37.7 KB
ID: 1483011

Finally, I know this isn't an R forum, but there is an oaxaca package with integrated graphing in R. It is called oaxaca as well, but the author is different. The graphing command is more intuitive, I think, but you will have to learn R, and I don't see an option to group coefficients like in our oaxaca package..

Last edited by Weiwen Ng; 10 Feb 2019, 12:06.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Comment

Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#6

11 Feb 2019, 09:09

Some minor refinements:

Code:

coefplot (., keep(explained:*)), bylabel("Explained") || /// (., keep(unexplained:*)), bylabel("Unexplained") || , /// drop(*:_cons) recast(bar) barwidth(0.5) citop ciopts(recast(rcap) color(black)) /// byopts(cols(1)) xline(0, lpattern(dash)) xlabel(, grid glstyle(minor_grid) glpattern(dash))

In the last line, I arranged the two subgraphs in one column, which makes the scale a lot easier to read. I addded a zero line through the x-axis (tgat's the red dashed line; red is the default color). In the xlabel option, which normally alters the axis labels and ticks, I asked for a grid extending through the plot area, which helps you visualize the magnitudes of each coefficient better. I also changed the grid line style and pattern (default is very faint and solid line). Readers who are interested in further customization of the scheme can do more research on their own, but basically you can use most of the available (graph) twoway options. In coefplot, some options have to go into the byopts option, but most of the ones that affect the graphs go into the main set of options. If specifying colors for each subgraph, I think you need to add them after each subgraph (not tested).

By now, this is getting pretty close to the presentation in the original post. For reference, that graphic looks like it was produced by the R package oaxaca (by Marek Hlavac, available on CRAN). As a matter of fact, the coefficients listed look like they're from that package's stock dataset.

I initially held off on posting code here because I had over 20 coefficients to graph, and I thought the code was a bit too tedious. It's more doable than I initially thought. If you're in this situation, you may want to take note of the option (in the Stata package) to group or subsume some coefficients, e.g.:

Code:

oaxaca lnwage educ (expten: exper tenure), by(female)

You'll get coefficients called explained:expten and unexplained:expten if using the twofold decomposition, or endowments:, coefficients:, and interaction: if using the threefold decomposition. You won't get the individual coefficients that you subsumed. Again, this option appears to be missing from this particular R oaxaca package, and I'm not familiar with other R packages that do Oaxaca-Blinder decomposition.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
3 likes
Comment
Phetetso Mofolo

Join Date: Mar 2021

Posts: 2
#7

08 Mar 2021, 01:53

hi i am currently doing my masters research study and i find it challenging to get a stata command that gives detailed Oaxaca decomposition of non linear model like Negative Binomial model. the command "gdecomp" says invalid name when i am trying to use it. which one can i use either?
Comment

Announcement