Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Heckman selection model with Blinder-Oaxaca Decomposition

    I am trying to decompose the log wage gap between the non-disabled (DISTYPE = 1) and work-limited disabled (DISTYPE =2) into 'explained and unexplained' components for males by a Blinder-Oaxaca decomposition that accounts for those unemployed (GRSSWK = 0) via a Heckman selection method. I have looked at online resources, including Ben Jann, but each alteration made has nevertheless not allowed me to run this.

    I created dummy variables for each value of each categorical variable and then tried applying this (simplified) command:

    Code:
     oaxaca logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & inlist(DISTYPE,1,2) model2(heckman, twostep select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2)) pooled
    (to note: some dummy variables included in the wage equation = 0 for a categorical variable and the lfp equation excludes any industry variables). The command states that 'option by() required', however I am unconvinced, even if this problem is addressed, Stata will run this.

    So I was wondering if any clarity could be provided whether I am anywhere near the correct code for this? Many thanks.

  • #2
    Just to clarify, I have been able to run an uncorrected decomposition. For the example above,
    Code:
     
     xi: oaxaca logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & inlist(DISTYPE,1,2), by(DISTYPE) pooled relax
    However, I cannot work out how to run the corrected decomposition via Heckman (whereby a labour force participation (lfp) equation is modelled).
    Any help would be greatly appreciated regarding its integration into the above code, thank you.

    Comment


    • #3
      Following from #2, another question regards time. All my variables (except ethnicity and gender) have 1 or 5 at the end to detail quarter 1 or quarter 5. I ran the code in #2 (that included 'pooled') after dropping observations in quarter 5.

      Am I best to keep quarter 5 observations, reshape my data and specify 'by(quarter)' if I am assessing how the decomposition changes over time using 'pooled'? And, if so, how would I do this if I have already specified 'by(DISTYPE1) in #2?

      Or can I just do Q1 and Q5 separately, using 'pooled' both times, and assess the differences in wage decompositions (especially explained and unexplained)

      If anybody could help with #2 and this, it would be greatly appreciated.
      Last edited by Will Murphy; 04 Apr 2020, 06:40.

      Comment


      • #4
        Your code in #1 can't work because you miss a colon after the inlist.
        Code:
         
          oaxaca logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & inlist(DISTYPE,1,2), model2(heckman, twostep select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2)) pooled
        If you want to compare changes over time, then you should run the Smith-Welch decomposition. In Stata, you can use the smithwelch-command which is also by Ben Jann. The code for the decomposition in your case could look like this:
        Code:
        ssc install smitchwelch // in case you don't have the command yet
        heckman logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & DISTTYPE==1 & quarter1 /* assuming this your time variable */, select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2) twostep
        est store dist1q1
        
        heckman logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & DISTTYPE==1 & quarter5 /* assuming this your time variable */, select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2) twostep
        est store dist1q5
        
        heckman logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & DISTTYPE==2 & quarter1 /* assuming this your time variable */, select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2) twostep
        est store dist2q1
        
        heckman logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & DISTTYPE==2 & quarter5 /* assuming this your time variable */, select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2) twostep
        est store dist2q5
        
        smitchwelch dist1q1 dist2q1 dist1q5 dist2q2

        Comment


        • #5
          Originally posted by Sven-Kristjan Bormann View Post
          Your code in #1 can't work because you miss a colon after the inlist.
          Code:
          oaxaca logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & inlist(DISTYPE,1,2), model2(heckman, twostep select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2)) pooled
          If you want to compare changes over time, then you should run the Smith-Welch decomposition. In Stata, you can use the smithwelch-command which is also by Ben Jann. The code for the decomposition in your case could look like this:
          Code:
          ssc install smitchwelch // in case you don't have the command yet
          heckman logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & DISTTYPE==1 & quarter1 /* assuming this your time variable */, select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2) twostep
          est store dist1q1
          
          heckman logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & DISTTYPE==1 & quarter5 /* assuming this your time variable */, select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2) twostep
          est store dist1q5
          
          heckman logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & DISTTYPE==2 & quarter1 /* assuming this your time variable */, select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2) twostep
          est store dist2q1
          
          heckman logGRSSWK1 i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.dREGWKR1 i.dREGWKR2 i.dIND1 i.dIND1 if MALE == 1 & DISTTYPE==2 & quarter5 /* assuming this your time variable */, select(lfp = i.WHITE i.dAGES11 i.dAGES12 i.dAGES13 i.dRES1 i.dRES2 i.dRES3 i.MARRIAGE1 i.MARRIAGE2) twostep
          est store dist2q5
          
          smitchwelch dist1q1 dist2q1 dist1q5 dist2q2
          Thank you very much for the helpful reply and code, Sven. Regarding
          Code:
           smithwelch dist1q1 dist2q1 dist1q5 dist2q5
          it resulted in 'invalid syntax'. I then double checked it wasn't me via :
          Code:
           smithwelch _est_dist1q1 _est_dist2q1 _est_dist1q5 _est_dist2q5
          which also did not work. I checked the estimates stored and they're 1s and 0s, is this fine?

          Any help could be provided to resolve this? It would be greatly appreciated.

          Comment


          • #6
            You did not give an example of your data, so my code is just an idea of what might work. Especially, I have no idea how your quarter variable looks like. So that might be a potential source of error.
            I checked the estimates stored and they're 1s and 0s, is this fine?
            I don't understand what you mean.

            Comment


            • #7
              Originally posted by Sven-Kristjan Bormann View Post
              You did not give an example of your data, so my code is just an idea of what might work. Especially, I have no idea how your quarter variable looks like. So that might be a potential source of error.
              I don't understand what you mean.
              My apologies, an example of my dataset is presented below- whereby variables ended in e.g 5 to represent the quarter 5 value. Thus I had to reshape long.

              [CODE]
              Code:
              * Example generated by -dataex-. To install: ssc install dataex
              clear
              input float id byte quarter float logGRSSWK byte DISTYPE
              32 1 6.175867 4
              32 5 6.263398 4
              38 1 5.749393 4
              38 5 5.910797 4
              end
              label values DISTYPE DISTYPE
              label def DISTYPE 4 "Non-disabled", modify
              /CODE]

              Please disregard the question regarding 1s and 0s.

              Comment


              • #8
                If I may, to avoid consuming any more of your time, lastly ask two quick follow-up questions to your oaxaca command, because I wish to compare the results to Smith-Welch and assess discrimination.

                1. After taking your advice on board from #4, I also wanted to implement a 'by quarter' command, however the following two commands I tried each proved unsuccessful:

                Code:
                 by quarter DISTYPE, sort: oaxaca $wageeq if inlist(DISTYPE,1,2), model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) pooled relax
                
                oaxaca $wageeq if inlist(DISTYPE,1,2), by(DISTYPE quarter) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) pooled relax
                The former said by() required whilst the latter said that by() too many variables specified.
                Is there any solution to this problem?

                2. I have naively used 'pooled' in my oaxaca commands to ensure the twofold decomposition occurs whereby I can obtain explained and unexplained components. I have read online resources to understand what pooled implies however they are difficult to apply to my current context. I was wondering if any clarity could be provided?

                Many thanks.

                Comment


                • #9
                  Regarding your question number 1: I don't understand why you want to avoid the required by()-option. The by()-option takes exactly one binary variable as a group indicator as input.
                  The only way around this restriction is to rewrite to the oaxaca-command such that it allows more variables for the by()-option or to remove the requirement of the by()-option.
                  Why don't you want to run oaxaca separate for each quarter like the code below?

                  Code:
                  oaxaca $wageeq if inlist(DISTYPE,1,2) & quarter==1, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) pooled relax
                  oaxaca $wageeq if inlist(DISTYPE,1,2) & quarter==5, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) pooled relax
                  Regarding your second question:
                  I have read online resources to understand what pooled implies however they are difficult to apply to my current context.
                  I don't understand what you mean. Why are they difficult to apply to your current context?

                  Comment


                  • #10
                    Originally posted by Sven-Kristjan Bormann View Post
                    Regarding your second question: I don't understand what you mean. Why are they difficult to apply to your current context?
                    Thank you for the reply. I wish to assess the explained and unexplained components of the wage gap, which I believe means I would have to run a twofold oaxaca decomposition. I was blindly/ naively attempting to use 'pooled' to enable this without knowing what its inclusion does.

                    Jann (2008) states: "The pooled option also causes the coefficients from a pooled model to be used, but now the pooled model also contains a group membership indicator". Apologies if I am being foolish, but I don't understand in my context of assessing the male wage gap of Non-disabled (DISTYPE =1) and work-limited disabled (DISTYPE=2) (bearing in mind DISTYPE can also equal 3), what the pooled model would be? Because when I attempt to use this code:


                    Originally posted by Sven-Kristjan Bormann View Post
                    oaxaca $wageeq if inlist(DISTYPE,1,2) & quarter==1, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) pooled relax
                    I get the issue of coexistence of factor-variable and time-series operators, despite generating dummy variables for categorical ones and implementing them in my wage and select equations, e.g:
                    Code:
                     tab AGES, gen(dAGES)
                    global wageeq "i.WHITE dAGES1 dAGES2 dAGES3..
                    So, using 'xi:' at the start but with 'noisily' instead of 'relax' (due to some zero variance coefficients in both models) works BUT only as a regression model when doing a threefold decomposition
                    Code:
                     xi: oaxaca logGRSSWK $wageeq if inlist(DISTYPE,1,4) & quarter==1, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) noisily
                    and at the end of the regression coefficient tabulations I get:
                    Code:
                     dropped coefficients or zero variances encountered
                    specify -relax- to ingnore
                    r(499);
                    When using the same code with 'relax noisily', as well as 'pooled noisily' for the twofold, I get the code:
                    Code:
                     Dependent variable never censored because of selection: 
                    model would simplify to OLS regression
                    r(498);

                    Is there anything I can do to at least run a valid threefold oaxaca with heckman because it is treating it like a regression model with heckman- maybe amalgamate the categorical variable labels to ensure no resulting dummy variable has zero variance coefficient? Thank you.
                    Last edited by Will Murphy; 06 Apr 2020, 04:01.

                    Comment


                    • #11
                      To note: using a threefold oaxaca with heckman and 'relax' instead of 'noisily' does yield threefold oaxaca tables but not accounting for any changes that should be induced by heckman, stating that both model 1 and 2 have zero variance coefficients.
                      Also, apologies for any confusion: DISTYPE can take 1 if work-limited disabled, 2 if daily activity limited disabled and 4 if non-disabled. I know in previous codes I have referred otherwise but it was as a matter of simplification.
                      Last edited by Will Murphy; 06 Apr 2020, 05:39.

                      Comment


                      • #12
                        Originally posted by Sven-Kristjan Bormann View Post
                        Why don't you want to run oaxaca separate for each quarter like the code below?

                        Code:
                        oaxaca $wageeq if inlist(DISTYPE,1,2) & quarter==1, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) pooled relax
                        oaxaca $wageeq if inlist(DISTYPE,1,2) & quarter==5, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) pooled relax
                        Apologies Sven, I have been trying to solve my questions and can now provide a clearer question if ok? Instead of using 'xi' I have used 'normalize' in my wage equation such that by running the following:
                        Code:
                         oaxaca logGRSSWK $wageeq if inlist(DISTYPE,1,4) & quarter == 1, by(DISTYPE)
                        model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) noisily relax
                        I obtain a sufficient threefold (with heckman) coefficient but whilst the heckman occurs eg with models 1 and 2 (model 1 only shown below):

                        HTML Code:
                        Model for group 1
                        note: dWHITE2 dropped because of collinearity
                        note: dAGE1 dropped because of collinearity
                        note: dEDUCATION3 dropped because of collinearity
                        note: dMARITALSTATUS3 dropped because of collinearity
                        note: dCHILDREN4 dropped because of collinearity
                        note: dRESIDENCE3 dropped because of collinearity
                        note: dWORKREGION2 dropped because of collinearity
                        note: dWORKREGION6 dropped because of collinearity
                        note: dEDUCATION2 dropped because of collinearity
                        
                        Heckman selection model -- two-step estimates   Number of obs     =        299
                        (regression model with sample selection)              Selected    =         62
                                                                              Nonselected =        237
                        
                                                                        Wald chi2(22)     =      49.38
                                                                        Prob > chi2       =     0.0007
                        
                        ---------------------------------------------------------------------------------
                              logGRSSWK |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        ----------------+----------------------------------------------------------------
                        logGRSSWK       |
                                dWHITE2 |   1.382135   .5484004     2.52   0.012     .3072904     2.45698
                                  dAGE2 |   .2769868   .5706989     0.49   0.627    -.8415625    1.395536
                                  dAGE3 |  -.2430792   .5757544    -0.42   0.673    -1.371537    .8853787
                                  dAGE4 |    .043933   .5370769     0.08   0.935    -1.008718    1.096584
                                  dAGE5 |   -.301713   .4878076    -0.62   0.536    -1.257798    .6543723
                            dRESIDENCE1 |  -.5625492   .3800708    -1.48   0.139    -1.307474    .1823758
                            dRESIDENCE2 |  -1.428457     .71982    -1.98   0.047    -2.839278   -.0176359
                            dRESIDENCE4 |   .1827752   .4717347     0.39   0.698    -.7418078    1.107358
                            dRESIDENCE5 |  -.5827875   .4025659    -1.45   0.148    -1.371802    .2062272
                           dWORKREGION3 |   .6196988   .7701736     0.80   0.421    -.8898136    2.129211
                           dWORKREGION4 |  -.8755692   .4103622    -2.13   0.033    -1.679864   -.0712741
                           dWORKREGION5 |  -.5952795   .4452076    -1.34   0.181     -1.46787    .2773113
                             dINDUSTRY3 |  -.3751538   .2356986    -1.59   0.111    -.8371145    .0868069
                             dINDUSTRY4 |  -.2147377   .2406301    -0.89   0.372    -.6863641    .2568887
                             dINDUSTRY5 |  -.0851482   .2495602    -0.34   0.733    -.5742771    .4039808
                             dINDUSTRY6 |   .1276264   .2239924     0.57   0.569    -.3113906    .5666435
                             dINDUSTRY7 |  -.1199129   .2576881    -0.47   0.642    -.6249722    .3851464
                             dINDUSTRY8 |  -.6659574   .4706148    -1.42   0.157    -1.588345    .2564308
                             dINDUSTRY9 |  -.4692162   .3326219    -1.41   0.158    -1.121143    .1827106
                            dEDUCATION1 |   .1984278   .1459934     1.36   0.174    -.0877141    .4845697
                            dJOBTENURE2 |  -.1833522   .2226105    -0.82   0.410    -.6196606    .2529563
                            dJOBTENURE3 |   .2315784   .1918933     1.21   0.228    -.1445255    .6076823
                                  _cons |   5.929038   .9664496     6.13   0.000     4.034831    7.823244
                        ----------------+----------------------------------------------------------------
                        select          |
                                dWHITE1 |  -.7267648   .7198301    -1.01   0.313    -2.137606    .6840764
                                  dAGE2 |   1.473688   .6904321     2.13   0.033     .1204661     2.82691
                                  dAGE3 |   1.410782   .6285173     2.24   0.025     .1789109    2.642653
                                  dAGE4 |   .8470128   .6171357     1.37   0.170     -.362551    2.056577
                                  dAGE5 |   .4545676   .6441714     0.71   0.480     -.807985     1.71712
                            dRESIDENCE1 |  -.4479788   .2637305    -1.70   0.089     -.964881    .0689235
                            dRESIDENCE2 |  -.2876624   .2768774    -1.04   0.299     -.830332    .2550073
                            dRESIDENCE4 |  -.0811826   .4403324    -0.18   0.854    -.9442182     .781853
                            dRESIDENCE5 |  -.2821816   .3914729    -0.72   0.471    -1.049454    .4850911
                            dRESIDENCE6 |  -.9099623   .3272936    -2.78   0.005    -1.551446   -.2684787
                            dEDUCATION1 |  -.0309179   .2009303    -0.15   0.878    -.4247341    .3628983
                        dMARITALSTATUS1 |   .4888559   .2661098     1.84   0.066    -.0327098    1.010422
                        dMARITALSTATUS2 |   .4727092   .3014175     1.57   0.117    -.1180582    1.063477
                             dCHILDREN1 |   -.772282   .4433357    -1.74   0.082    -1.641204      .09664
                             dCHILDREN2 |  -.2712228   .4939069    -0.55   0.583    -1.239262    .6968169
                             dCHILDREN3 |   .2120907   .5079802     0.42   0.676    -.7835321    1.207714
                             dCHILDREN5 |  -5.346187          .        .       .            .           .
                                  _cons |   -.868004   .6779183    -1.28   0.200    -2.196699    .4606914
                        ----------------+----------------------------------------------------------------
                        mills           |
                                 lambda |  -.4767588   .2459464    -1.94   0.053    -.9588049    .0052873
                        ----------------+----------------------------------------------------------------
                                    rho |   -0.84768
                                  sigma |  .56242887
                        ---------------------------------------------------------------------------------
                        (dRESIDENCE3 dWORKREGION2 dWORKREGION6 dEDUCATION2 dropped from model 1)
                        (model 1 has zero variance coefficients)
                        But the results for the oaxaca decomposition part (shown below) I believe only show the results for the selected observations of DISTYPE = 1 (62) and not the total number of observations:
                        HTML Code:
                        Blinder-Oaxaca decomposition                    Number of obs     =        594
                                                                          Model           =     linear
                        Group 1: DISTYPE = 1                              N of obs 1      =         62
                        Group 2: DISTYPE = 4                              N of obs 2      =        532
                        
                        ------------------------------------------------------------------------------
                           logGRSSWK |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                        -------------+----------------------------------------------------------------
                        overall      |
                             group_1 |   6.568103   .2910431    22.57   0.000     5.997669    7.138537
                             group_2 |   6.511034   .0700499    92.95   0.000     6.373738    6.648329
                          difference |   .0570693   .2993544     0.19   0.849    -.5296546    .6437932
                          endowments |  -.0384598   .0360136    -1.07   0.286    -.1090451    .0321256
                        coefficients |  -.0064642   .2993338    -0.02   0.983    -.5931478    .5802193
                         interaction |   .1019933   .0706072     1.44   0.149    -.0363943    .2403809
                        I was wondering if this is actually correct or not (I am under the impression it is the latter, perhaps resulting from both models having zero variance coefficients)?
                        Apologies once again.

                        Comment


                        • #13
                          But the results for the oaxaca decomposition part (shown below) I believe only show the results for the selected observations of DISTYPE = 1 (62) and not the total number of observations:
                          Which number of observations did you expect? I don't think that you have shown before a description of your dataset (e.g. the output of the describe-command).
                          Maybe you need to post also the output from model 2 to get a complete picture of the situation.

                          At the moment, I am also not sure if you still want an answer to your previous questions.

                          Comment


                          • #14
                            Originally posted by Sven-Kristjan Bormann View Post
                            Which number of observations did you expect?
                            I may be wrong but I was under the impression that the Oaxaca part (the 2nd HTML code of #12) would have said 'N of obs 1 = 299' (the 62 selected and the 237 non selected but accounted for by Heckman (in the 1st HTML code #12))? I may be mistaken but I thought only observing the 62 selected disregards the use of heckman here to provide an estimation of the wages of the non selected (i.e those currently not in work)?

                            If I am right regarding this please let me know and I will show the description of the dataset/ whatever is needed.

                            Originally posted by Sven-Kristjan Bormann View Post
                            At the moment, I am also not sure if you still want an answer to your previous questions.
                            My apologies, my line of methodology is to compare the uncorrected with the corrected oaxaca (hence my thinking that my oaxaca finding from #12 using heckman will be no different to it without) and therefore was wondering what the use of 'pooled' in a code such as:
                            Code:
                             oaxaca $wageeq if inlist(DISTYPE,1,4) & quarter == 1, by(DISTYPE) pooled relax
                            implies? I understand it uses coefficients from a pooled model over both groups as reference coefficients. Am I right in thinking both groups here implies DISTYPE =1 and DISTYPE =4?

                            Any clarity regarding these two issues, especially the former, would be greatly appreciated. Thank you.

                            Comment


                            • #15
                              Sven, please forgive me for keep answering my own questions however my former query in #14 can be explained through comparing the coefficients with and without heckman and noting that, whilst the number of observations in the oaxaca part does stay the same, the coefficients for both groups change- the result I believe of heckman.

                              Regarding the latter, I am now aware that there is a problem within Stata whereby oaxaca restricts the sample for the pooled model to observations for which wages are not missing- so the pooled model does not work with heckman. Instead I will use e.g weight(0).

                              So my only question in this post (and my final in this thread) is that I obtain a lot of statistically insignificant p-values nearly everything especially the most important, e.g:

                              Code:
                               oaxaca logGRSSWK $wageeq if inlist(DISTYPE,1,4) & quarter == 1, by(DISTYPE)
                              model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq))
                              weight(0) noisily relax
                              HTML Code:
                              Blinder-Oaxaca decomposition                    Number of obs     =        566
                                                                                Model           =     linear
                              Group 1: DISTYPE = 1                              N of obs 1      =         58
                              Group 2: DISTYPE = 4                              N of obs 2      =        508
                              
                              ------------------------------------------------------------------------------
                                 logGRSSWK |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
                              -------------+----------------------------------------------------------------
                              overall      |
                                   group_1 |   6.452112   .2393385    26.96   0.000     5.983017    6.921207
                                   group_2 |   6.461553   .0648995    99.56   0.000     6.334352    6.588753
                                difference |  -.0094405   .2479816    -0.04   0.970    -.4954755    .4765945
                                 explained |  -.0314521   .0372299    -0.84   0.398    -.1044212    .0415171
                               unexplained |   .0220116   .2491664     0.09   0.930    -.4663457    .5103689

                              Is there anything I can do to test/ resolve this issue? Thanks.

                              Comment

                              Working...
                              X