Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Decomposing an Effect

    Hello. I have been playing around with Stata to learn more statistics. I saw the shipped data sets and wanted to look at the relationship between a person's wages being above the median and race.

    Code:
    clear all
    sysuse nlsw88
    gen wagemedian = (wage > 6)
    gen black = race == 2
    When I run this relationship I see that there is a negative coefficient on a person being Black

    Code:
    reg wagemedian black
    I then thought that location must explain wages but I was surprised that it also improved the wage difference

    Code:
    reg wagemedian black south
    When I then ran the regression of south on race I see that Black individuals are more likely to live in the south

    Code:
    reg black south
    And that the south pays lower wages

    Code:
    reg wagemedian south
    How can I isolate how much of the relationship between a person being Black and their wages being above the median is due to the higher likelihood for Black individuals to live in the south versus Black individuals receiving lower wages regardless of location? In other words, how much of the first relationship "reg wagemedian black" is due to the higher likelihood of living in the south for Black individuals versus a general effect?

    Also, how can I isolate this relationship between black and wagemedian as I add more to the equation like the industry people work in

    Code:
    reg wagemedian black south i.industry
    Last edited by Laura Freds; 04 Aug 2022, 06:53.

  • #2
    here is one option:

    https://stats.oarc.ucla.edu/stata/fa...e-sem-command/

    ---------------------------------
    Maarten L. Buis
    University of Konstanz
    Department of history and sociology
    box 40
    78457 Konstanz
    Germany
    http://www.maartenbuis.nl
    ---------------------------------

    Comment


    • #3
      Laura:
      you might also be interested in:
      Code:
      . reg wagemedian i.black##i.south
      
            Source |       SS           df       MS      Number of obs   =     2,246
      -------------+----------------------------------   F(3, 2242)      =     28.12
             Model |  20.2727294         3  6.75757646   Prob > F        =    0.0000
          Residual |  538.722818     2,242  .240286716   R-squared       =    0.0363
      -------------+----------------------------------   Adj R-squared   =    0.0350
             Total |  558.995548     2,245   .24899579   Root MSE        =    .49019
      
      ------------------------------------------------------------------------------
        wagemedian | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
      -------------+----------------------------------------------------------------
           1.black |   .0569339   .0369305     1.54   0.123    -.0154877    .1293555
                   |
             south |
            South  |  -.0639549   .0253365    -2.52   0.012    -.1136403   -.0142694
                   |
       black#south |
          1#South  |  -.2437816   .0492987    -4.94   0.000    -.3404575   -.1471058
                   |
             _cons |   .5859232   .0148203    39.54   0.000     .5568603    .6149861
      ------------------------------------------------------------------------------
      
      .
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Originally posted by Carlo Lazzaro View Post
        Laura:
        you might also be interested in:
        Code:
        . reg wagemedian i.black##i.south
        
        Source | SS df MS Number of obs = 2,246
        -------------+---------------------------------- F(3, 2242) = 28.12
        Model | 20.2727294 3 6.75757646 Prob > F = 0.0000
        Residual | 538.722818 2,242 .240286716 R-squared = 0.0363
        -------------+---------------------------------- Adj R-squared = 0.0350
        Total | 558.995548 2,245 .24899579 Root MSE = .49019
        
        ------------------------------------------------------------------------------
        wagemedian | Coefficient Std. err. t P>|t| [95% conf. interval]
        -------------+----------------------------------------------------------------
        1.black | .0569339 .0369305 1.54 0.123 -.0154877 .1293555
        |
        south |
        South | -.0639549 .0253365 -2.52 0.012 -.1136403 -.0142694
        |
        black#south |
        1#South | -.2437816 .0492987 -4.94 0.000 -.3404575 -.1471058
        |
        _cons | .5859232 .0148203 39.54 0.000 .5568603 .6149861
        ------------------------------------------------------------------------------
        
        .
        Thanks, Carlo. I am new to stats, what does the above tell us? Also, depending on your response, imagine if I had individual states would you interact each state with black? Is there another way to explain how much of the variation of wagemedian is due to black versus the relationship between black and state?

        Comment


        • #5
          What Carlo is proposing is an interaction, what I proposed is a mediation. Those are two completely different things.

          An interaction says the effect of being black is different dependending on whether one lives in the south or in the non-South.

          A mediation says a part of the effect of race on income is due to the fact that blacks are more likely to live in the sourth and in the south you get a lower income.
          ---------------------------------
          Maarten L. Buis
          University of Konstanz
          Department of history and sociology
          box 40
          78457 Konstanz
          Germany
          http://www.maartenbuis.nl
          ---------------------------------

          Comment


          • #6
            Originally posted by Maarten Buis View Post
            What Carlo is proposing is an interaction, what I proposed is a mediation. Those are two completely different things.

            An interaction says the effect of being black is different dependending on whether one lives in the south or in the non-South.

            A mediation says a part of the effect of race on income is due to the fact that blacks are more likely to live in the sourth and in the south you get a lower income.
            Thanks for that. I am new to stats so sorry for any lack of clarity.

            I want to explain how much of the effect of black on wages is due to being Black versus the fact that Black individuals are more likely to live in the south. Is that a mediation?

            Is there a way to get this estimate in an OLS regression. I am most comfortable with OLS right now.

            Also, if I had each state (50 in all) and not just a binary location (south), how would I similarly discuss how much of the estimate of black is due to location?
            Last edited by Laura Freds; 04 Aug 2022, 08:49.

            Comment


            • #7
              Laura:
              your replies clarify.
              Follow Maarten's approach, as now is clearer that what you're after is a mediation (as per ;Maarten's suggestion), not an interaction (as per my previous proposal).
              Kind regards,
              Carlo
              (Stata 19.0)

              Comment


              • #8
                Code:
                // open example data
                sysuse nlsw88, clear
                
                // prepare the data
                
                gen byte black = race == 2 if !missing(race)
                label variable black "respondent's race"
                label define black 0 "not black" ///
                                   1 "black"
                label value black black
                
                // collect the bits
                tempname a b direct indirect total
                
                reg wage i.black
                scalar `total' = _b[1.black]
                reg south i.black
                scalar `a' = _b[1.black]
                reg wage i.black i.south
                scalar `b' = _b[1.south]
                scalar `direct' = _b[1.black]
                scalar `indirect' = `a' * `b'
                
                // admire the results
                di as txt "The total effect is    " as result `total'
                di as txt "The direct effect is   " as result `direct'
                di as txt "The indirect effect is " as result `indirect'
                di as txt "So " as result `indirect'/`total'*100 as txt "% of the total effect can be explained by south"
                ---------------------------------
                Maarten L. Buis
                University of Konstanz
                Department of history and sociology
                box 40
                78457 Konstanz
                Germany
                http://www.maartenbuis.nl
                ---------------------------------

                Comment


                • #9
                  Thank you Professor Buis. This is very helpful for my learning. How does the above change if we replace south with state, where state has more than 1 value?

                  Comment


                  • #10
                    Originally posted by Maarten Buis View Post
                    Code:
                    // open example data
                    sysuse nlsw88, clear
                    
                    // prepare the data
                    
                    gen byte black = race == 2 if !missing(race)
                    label variable black "respondent's race"
                    label define black 0 "not black" ///
                    1 "black"
                    label value black black
                    
                    // collect the bits
                    tempname a b direct indirect total
                    
                    reg wage i.black
                    scalar `total' = _b[1.black]
                    reg south i.black
                    scalar `a' = _b[1.black]
                    reg wage i.black i.south
                    scalar `b' = _b[1.south]
                    scalar `direct' = _b[1.black]
                    scalar `indirect' = `a' * `b'
                    
                    // admire the results
                    di as txt "The total effect is " as result `total'
                    di as txt "The direct effect is " as result `direct'
                    di as txt "The indirect effect is " as result `indirect'
                    di as txt "So " as result `indirect'/`total'*100 as txt "% of the total effect can be explained by south"
                    Thanks again. How would this work when a variable has multiple values. For example, if south has 4 values.

                    Comment

                    Working...
                    X