Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    You could see if bootstrapping the standard errors via
    Code:
    bootstrap :oaxaca logGRSSWK $wageeq if inlist(DISTYPE,1,4) & quarter == 1, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) weight(0) noisily relax
    gives you different results.
    You also run a simple t-test to see if there are any differences between the groups
    Code:
    ttest logGRSSWK if inlist(DISTYPE,1,4) & quarter==1,by(DISTYPE)
    Maybe there are no wage? differences to begin with.

    Comment


    • #17
      Originally posted by Sven-Kristjan Bormann View Post
      You could see if bootstrapping the standard errors via
      Code:
      bootstrap :oaxaca logGRSSWK $wageeq if inlist(DISTYPE,1,4) & quarter == 1, by(DISTYPE) model1(heckman, twostep select($seleq)) model2(heckman, twostep select($seleq)) weight(0) noisily relax
      gives you different results.
      You also run a simple t-test to see if there are any differences between the groups
      Code:
      ttest logGRSSWK if inlist(DISTYPE,1,4) & quarter==1,by(DISTYPE)
      Maybe there are no wage? differences to begin with.
      Thank you very much for the suggestions.
      Running the latter t-test suggested the wage difference was significant at the 5% level:

      HTML Code:
       diff = mean(WLD) - mean(Non-disa)                             t =  -1.9896
      Ho: diff = 0                                     degrees of freedom =      564
      
          Ha: diff < 0                 Ha: diff != 0                 Ha: diff > 0
       Pr(T < t) = 0.0236         Pr(|T| > |t|) = 0.0471          Pr(T > t) = 0.9764
      Thus I believe suggesting this wage decomposition is somewhat necessitated.

      However bootstrapping the standard errors led to:
      HTML Code:
      Bootstrap replications (50)
      ----+--- 1 ---+--- 2 ---+--- 3 ---+--- 4 ---+--- 5 
      xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    50
      insufficient observations to compute bootstrap standard errors
      no results will be saved
      r(2000);
      Perhaps this is due to there only being 58 observations for WLD in quarter 1 and thus the insignificant p-values could be perhaps attributed to a small sample?

      Comment


      • #18
        Alternatively, could it be of the way I am treating categorical variables? i.e :

        Code:
         global wageeq "normalize(b.dWHITE1 dWHITE2) normalize(b.dAGE1 dAGE2 dAGE3 dAGE4 dAGE5)
        normalize(dRESIDENCE1 dRESIDENCE2 dRESIDENCE3 dRESIDENCE4 dRESIDENCE5 b.dRESIDENCE6)..."
        
        global seleq "dWHITE1 dWHITE2 dAGE1 dAGE2 dAGE3 dAGE4 dAGE5 dRESIDENCE1 dRESIDENCE2 dRESIDENCE4 dRESIDENCE5 dRESIDENCE6..."
        Sorry I am just trying to get my head round problem after problem I am facing as I am unable to change datasets.

        Comment


        • #19
          (this post #19 was cleared, via editing, as of an incorrect coding that, when corrected, made this post redundant). My queries in #17 and #18 still stand and any advice/clarity would be greatly appreciated.
          Last edited by Will Murphy; 08 Apr 2020, 05:39.

          Comment


          • #20
            Perhaps this is due to there only being 58 observations for WLD in quarter 1 and thus the insignificant p-values could be perhaps attributed to a small sample?
            This could be indeed the case.

            Alternatively, could it be of the way I am treating categorical variables?
            Without having seen your dataset, this is difficult to answer. However, the difference which is reported in your post #15 is the raw difference between the groups based on individuals without missing values for each variable.
            Maybe, you can rerun the t-test with something like this:
            Code:
            ttest logGRSSWK if inlist(DISTYPE,1,4) & quarter==1 & !mi(WHITE,AGE,RESIDENCE),by(DISTYPE)
            and see if the difference is still significant or add more of your independent variables to !mi(...)

            A side note: You could convert your variable names to lower to make your and my life easier when typing the commands.
            Code:
            rename *,lower

            Comment


            • #21
              Originally posted by Sven-Kristjan Bormann View Post
              Maybe, you can rerun the t-test with something like this:
              Code:
              ttest logGRSSWK if inlist(DISTYPE,1,4) &amp; quarter==1 &amp; !mi(WHITE,AGE,RESIDENCE),by(DISTYPE)
              and see if the difference is still significant or add more of your independent variables to !mi(...)

              A side note: You could convert your variable names to lower to make your and my life easier when typing the commands.
              Code:
              rename *,lower
              Thanks for the advice-all posts on this thread have been when restricting my sample to only males (i.e dropping female observations). Therefore, accounting for both genders (i.e adding a gender dummy variable to my wage and select equations) has strengthened the p-value significance to an extent and provides some form of a solution.

              Running the above t-tests, with and without the inclusion of a gender control variable, the addition of variables yielded no change at all to the significance of the difference, further pointing me towards the issue of small sample bias.

              Thank you for the invaluable help, it has been greatly appreciated. All the best.
              Last edited by Will Murphy; 08 Apr 2020, 14:44.

              Comment


              • #22
                Originally posted by Sven-Kristjan Bormann View Post
                This could be indeed the case.


                Without having seen your dataset, this is difficult to answer. However, the difference which is reported in your post #15 is the raw difference between the groups based on individuals without missing values for each variable.
                Maybe, you can rerun the t-test with something like this:
                Code:
                ttest logGRSSWK if inlist(DISTYPE,1,4) &amp; quarter==1 &amp; !mi(WHITE,AGE,RESIDENCE),by(DISTYPE)
                and see if the difference is still significant or add more of your independent variables to !mi(...)

                A side note: You could convert your variable names to lower to make your and my life easier when typing the commands.
                Code:
                rename *,lower
                Apologies, if I may ask one more question.

                A lot of relating literature either estimates for different genders or only assesses 1 gender, in order to avoid including the effects of gender in their measures. Whilst I have done that previously in only assessing males, as I mentioned adding females in my wage and selection equations increases my p-value significance considerably:
                i.e
                Code:
                 global wageeq "dAGE1 dAGE2 dAGE3...dFEMALE1 dFEMALE2
                .

                By assessing how much gender (i.e dFEMALE1 dFEMALE2) contributes to the explained gap (which appears to be relatively insignificant (in coefficient and p-value)) does this not account for the effects of gender? If this doesn't, are there any alternatives, e.g introducing interaction terms?
                Because I wouldn't want to only account for one/each gender and obtain very insignificant p-values again. Many thanks.
                Last edited by Will Murphy; 09 Apr 2020, 00:49.

                Comment


                • #23
                  Why are you concerned with the lack of significance? Your sample is rather small, so I would expect to see insignificant values, especially if the standard deviations of key variables are large.
                  I would calculate the decomposition separately by gender to avoid dealing directly with the gender wage gap.
                  By assessing how much gender (i.e. dFEMALE1 dFEMALE2) contributes to the explained gap (which appears to be relatively insignificant (in coefficient and p-value)) does this not account for the effects of gender? If this doesn't, are there any alternatives, e.g. introducing interaction terms?
                  I would need to see your results to say anything meaningful. Of course, you can try out different specifications and see what happens as part of a robustness check or something similar.

                  Because I wouldn't want to only account for one/each gender and obtain very insignificant p-values again.
                  If you obtain large p-values and your main specification is (theoretically) reasonable, then I would report these results.

                  Comment


                  • #24
                    Originally posted by Sven-Kristjan Bormann View Post

                    If you obtain large p-values and your main specification is (theoretically) reasonable, then I would report these results.
                    Thank you very much- sorry for my incompetence but just to quickly check: by my main specification being reasonable, do you mean whether oaxaca is suitable for doing what I wish to (i.e decompose the wage gap)?

                    Comment


                    • #25
                      By reasonable I mean that you included all variable from your dataset which seem to be sensible to include.
                      But it could be also that the Blinder-Oaxaca decomposition in itself is not suitable for your research problem. The Blinder-Oaxaca decomposition assumes like a normal OLS regression, a linear in parameters connection between the variables and calculates the effects and the means of the variables.
                      Maybe there are differences in other parts of the distribution but not at the mean.
                      I don't want to scare you, just pointing out what else to consider. Usually the BO-decomposition is reasonable and often used for wage decomposition.

                      Comment


                      • #26
                        Originally posted by Sven-Kristjan Bormann View Post
                        By reasonable I mean that you included all variable from your dataset which seem to be sensible to include.
                        But it could be also that the Blinder-Oaxaca decomposition in itself is not suitable for your research problem. The Blinder-Oaxaca decomposition assumes like a normal OLS regression, a linear in parameters connection between the variables and calculates the effects and the means of the variables.
                        Maybe there are differences in other parts of the distribution but not at the mean.
                        I don't want to scare you, just pointing out what else to consider. Usually the BO-decomposition is reasonable and often used for wage decomposition.
                        No that makes perfect sense thank you- I would use the extension of the B-O decomposition, i.e the RIF decomposition, however with a relatively small sample this is likely to lead to further problems. Thank you very much for all your help.

                        Comment

                        Working...
                        X