Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multiple Imputation with Oaxaca Decomposition (Stata MP 15.1, oaxaca v4.1.1, mi v1.3.3)

    Hi everyone,

    I'm working with survey data in Stata MP 15.1 and am hoping for advice or suggestions from anyone who has attempted to combine multiple imputation (MI) with the Oaxaca-Blinder decomposition.
    Key details:

    • Stata version: MP 15.1
    • oaxaca package: Version 4.1.1
    • mi package: Version 1.3.3
    • Data type: Complex survey data (with probability weights)
    • Outcome variable: Binary
    • Main objective: Incorporate MI results into a decomposition analysis
    As far as I understand, the oaxaca command does not natively support the mi estimate: prefix. I haven’t found documented support for this combination in the official documentation or user forums.
    What I’ve tried / considered:

    • I’ve considered running the Oaxaca decomposition manually on each imputed dataset.
    • I am considering running the oaxaca decomposition separately within each imputation using a loop, then exporting the results (e.g., explained and unexplained components, standard errors) for manual pooling.
    • I’ve also come across a 2013 post on Statalist suggesting the use of:
    Code:
    mi estimate, cmdok: oaxaca ... 
    Link to post
    • However, I’m not sure whether this approach produces valid results, since oaxaca performs more than just regression and may not fully align with Rubin’s rules or MI assumptions.
    • I’ve reviewed the literature and noted that some studies use listwise deletion or missing indicator methods with Oaxaca-Blinder decomposition (e.g., Newman, 2014), but these approaches have known limitations.
    Newman, D. A. (2014). Missing Data: Five Practical Guidelines. Organizational Research Methods, 17(4), 372–411. https://doi.org/10.1177/1094428114548590
    My questions:

    1. Has anyone attempted manual pooling of oaxaca output across imputed datasets? If so, how did you handle combining the explained and unexplained components?
    2. Is there a practical way to extract decomposition estimates and standard errors from each imputed dataset (e.g., using estimates store) and apply Rubin’s rules manually?
    3. Is there a better-suited user-written command or workaround for performing Oaxaca-Blinder decomposition with multiply imputed data?
    Thanks in advance for your help.

  • #2
    Personally, I would just go with mi estimate, cmdok. As long as the command works with this syntax, you can be sure that Rubin's rules can be applied, at least in theory. Unless you find a paper that proves why a different approach gives better result for this specific command, using the standard approach is your best option. I am not aware of other approaches that are specific to oaxaca.
    Best wishes

    Stata 18.0 MP | ORCID | Google Scholar

    Comment


    • #3
      Originally posted by Felix Bittmann View Post
      As long as the command works with this syntax [i.e., mi estimate, cmdok], you can be sure that Rubin's rules can be applied, at least in theory.
      I believe this statement is slightly misleading. Yes, the cmdok option works -- technically -- with any command that posts e(b) and e(V). That does not imply that Rubin's rules are applicable.

      As for the original question, I'm far from being an expert on Oaxaca-Blinder decomposition. However, this does not seem straightforward to me. As far as I understand, the decomposition is based on differences in coefficients and differences in means of predictors. Both should be approximately normal, but the decomposition appears to involve non-linear functions of these components, which may not be normal, even approximately. Estimation aside, I'm not even sure what the imputation model should look like. As the decomposition involves differences in coefficients, missing values should probably be imputed separately for each group; otherwise, the imputation model constrains coefficients to be the same across groups. However, imputing separately by group might risk breaking cross-group correlations of variables, and generally doesn't allow imputed variables to depend directly on group membership, which seems at odds with the core ideas behind the decomposition. I haven't given this too much thought, and I might over- or undercomplicate matters.

      Comment


      • #4
        Thanks for the clarification, what you describe is what I meant indeed. e(b) and e(V) need to be available and for some commands, mi estimate does not make any sense (for example, a cluster analysis). However, I would argue, as long as the required statistics (coefficient / means or whatever) and their standard errors are available, the rules can be applied, at least in theory. And perhaps the results might be off to some degree. If this has never been tested before, we cannot be sure. In the end, the benchmark is usually against the unimputed version. So, is the error larger if I do listwise deletion, or if I apply the rules, even if untested? My personal feeling is that any imputation approach is usually superior to doing nothing. I would be surprised if, for some specific reasons, oaxaca introduces a severe bias when applying the standard imputation procedures. The one who is most qualified to answer this question is probably Ben Jann.
        Best wishes

        Stata 18.0 MP | ORCID | Google Scholar

        Comment


        • #5
          Thank you for the responses.

          Currently, using mi estimate, cmdok: seems to be the most practical approach for combining multiple imputation with Oaxaca decomposition, given the lack of fully satisfactory alternatives, as highlighted in Table 4 of Newman (2014). However, I will be sure to note that the standard errors derived from this method should be regarded as provisional, given ongoing questions about their statistical accuracy.



          Comment

          Working...
          X