Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Oaxaca decomposition for nonlinear regression (nldecompose)

    Hello readers,
    I am trying to decompose (using Ben Jann’s nldecompose) differences in the probability of having more liberal gender views into differences in characteristics (cohort replacement) and differences in coefficients (intra-cohort change).

    The commands and output are as follows:

    Code:
    . nldecompose, by(Period): logit ideology1 Age [pweight=weight]
    
                                                       Number of obs (A) =    1872
                                                       Number of obs (B) =    1403
    
    ------------------------------------------------------------------------------
          Results |      Coef.  Percentage
    --------------+---------------------------------------------------------------
     Omega = 1    |
             Char |  -.0078933   7.860032%
             Coef |  -.0925304   92.13997%
    --------------+---------------------------------------------------------------
     Omega = 0    |
             Char |  -.0064641   6.436787%
             Coef |  -.0939597   93.56321%
    --------------+---------------------------------------------------------------
              Raw |  -.1004237        100%
    ------------------------------------------------------------------------------
    
    .
    I have read the stata manual but I have a few questions if anyone could kindly clarify:

    1.What is the difference or justification for using twofold over threefold? I understand the difference between them but not when to use each one (i.e. in twofold you are saying that if the observed variables have the same effect in each period, then it would explain x% of observed disparity in gender views – in threefold you add disparity in returns of these observed variables)

    2. In a number of academic papers I have read which use nldecompose, the authors have decomposed change into:
    • Differences attributable to observable characteristics (“Char”)
    • Differences not attributable to observable characteristics (“Coef)
    • Total difference
    I can see the first two parts in my output (char and coef) but how does one get the “total difference”?

    For example in:

    Arndt, B.J., 2017. Explaining Income-Related Disparities in Pap Smear Utilization: A regression-based decomposition analysis of differences in Pap smear utilization following implementation of the Affordable Care Act (No. 1694-2017-5829). – TABLE A3

    Kelly, E., McGuinness, S., O’connell, P.J., Haugh, D. and Pandiella, A.G., 2014. Transitions in and out of unemployment among young people in the Irish recession. Comparative Economic Studies, 56(4), pp.616-634. – TABLE 7

    An example of my data:

    Code:
    * Example generated by -dataex-. To install: ssc install dataex
    clear
    input byte(Period ideology1) int Age
    0 1 43
    0 1 48
    1 1 34
    1 1 24
    0 0 27
    1 1 24
    0 0 32
    1 1 36
    1 0 38
    1 1 38
    0 1 71
    0 1 39
    0 0 60
    0 1 25
    0 1 56
    0 1 51
    0 0 34
    0 1 19
    0 1 70
    0 1 62
    0 1 36
    1 0 45
    1 1 27
    1 1 16
    0 1 37
    1 1 38
    0 1 30
    1 1 52
    0 1 38
    1 1 17
    0 1 58
    0 1 42
    0 1 16
    0 1 41
    0 1 47
    0 1 41
    0 1 16
    0 1 40
    0 1 33
    0 0 51
    0 1 44
    0 1 53
    0 0 36
    0 0 58
    0 1 65
    0 0 52
    0 1 17
    0 1 20
    0 1 44
    0 0 38
    0 1 47
    0 1 65
    0 1 41
    0 1 41
    0 1 19
    0 0 56
    0 1 58
    0 1 50
    0 1 65
    0 1 51
    0 1 32
    0 1 18
    0 1 38
    0 0 51
    0 0 56
    0 0 77
    0 0 31
    0 1 40
    0 1 49
    0 0 53
    0 1 40
    0 0 63
    0 0 43
    0 1 51
    0 1 52
    0 1 49
    0 1 60
    0 1 36
    0 1 35
    0 1 76
    0 1 36
    0 1 46
    0 1 69
    0 1 55
    0 0 35
    0 1 25
    0 1 46
    0 1 60
    0 0 38
    0 0 50
    0 1 16
    0 1 62
    0 1 18
    0 0 35
    0 1 31
    0 0 73
    0 1 68
    0 1 50
    0 1 60
    0 1 33
    end
    label values Period per
    label def per 0 "1999-2004", modify
    label def per 1 "2005-2009", modify
    label values ideology1 edc
    label def edc 0 "Agree", modify
    label def edc 1 "Disagree", modify
    label values Age X003


  • #2
    Dear Sherine
    As far as i know, there is no strict rules regarding when to use which methodology. At the end of the day, it is really up to your research question and design. On a paper of mine, for example, i analyze wage differences based on BMI compared to people with Health BMI. In this case, it made more sense to use the threefold decomposition, because the coefficient and composition effect were both based on the baseline group. Individuals with health BMI.
    two fold decompositions, however, are more common, as they align better with the original design of the OB decomposition.
    Regarding the "total", it is the same as talking about RAW difference.
    HTH
    Fernando

    Comment


    • #3
      Thank you very much for explaining that Fernando.

      In this case would it then be correct to say the proportion of the difference attributable to cohort replacement (differences in mean characteristics) comprises .7 percentage points of the 100 total percentage point difference – with the remaining portion, 9.3 percentage points, attributable to intra cohort change (differences coefficients)?

      If I may also ask another question – which results do we look at (omega 1 or omega 0 – different specifications of the omega matrix)?

      Comment


      • #4
        I think you meant to say, Characteristics explain 7% and coefficients 93%. and that is the language i would use.
        Regarding to which result to choose. I would just put both, and explain which one you refer to when giving numbers. After all they both give you very similar results.
        Again, there is no rule for which one to choose, as long as you are consistent when explaining the results.

        Comment


        • #5
          Thank you for taking the time to explain Fernando.

          - Sherine

          Comment


          • #6
            hello sherine,
            could you please help me with the oaxaca decomposition, I am struggling with this technique. please clarify me that the variable format? what kind of format is required in dataset. for example gender. whether to put values in datasheet as male , female or o and 1 ??

            Comment

            Working...
            X