Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with coeff interpretation of a regression with interaction

    Hi, for the sake of my question I will use the following variables: dependent = lnwage, control1 = gender, control2 = education level

    I want to do a regression on log wages and the interaction between gender and edu level.

    Currently to run the regression I am using the following input

    reg lnwage i.gender##edu

    The base levels are gender: male and edu: level 1

    From my understanding the interpretation of the coeff of female#level 2 = -0.2

    Would be a female with a level 2 education receives 20% less than a male with a level 1 education. And a female with a level 3 edu receives 15% less than a male with a level 1 edu etc.. Since male, level one is the base.

    However, I want to be able to get output that allows me to interpret the coeff as a female with a level 2 edu receives 20% less than a male with a level 2 edu and a female with a level 3 edu receives 15% less than a male with a level 3 edu and so on.

    Basically I'd like a regression in which the females edu level is compared with a male of the same edu level, rather than a male at the base edu level, for every edu level, including the base level 1.

    I hope this makes sense. I am new to stata and haven't interpreted statistical values in years.

  • #2
    Please let me know if the question doesn't make sense or the desired output isn't possible. But I'm sure there must be a way to do this. Thanks in advance.

    Comment


    • #3
      First, your interpretation of the interaction coefficient is wrong. Second, you cannot get the kind of results you want from a regression, but you can get what you are looking for from the -margins- command following a regression.

      Going with your variables lnwage, gender, and edu, with the baselevels of gender and edu both being 1, then the coefficient of 2.gender#2.edu (which his the only interaction coefficient you will see in the regression output) is not the difference in lnwage between any of the combinations of gender and edu. Let's write your model as an equation, and the algebra is a little easier if we code gender and edu as 0 and 1 rather than 1 and 2:

      lnwage = b0 + b1*gender + b2* edu + b3*gender*edu + error term

      Then you can make the following table:
      gender edu E(lnwage)
      0 (male) 0 (level 1) b0
      0 (male) 1 (level 2) b0 + b2
      1 (female) 0 (level 1) b0 + b1
      1 (female) 1 (level 2) b0 + b1 + b2 + b3
      It is then apparent that b3, the interaction coefficient, is not the difference between any of these two groups.

      Next, you have also used the heuristic that a 0.2 change in lnwage corresponds to a 20% change in wage. That heuristic is based on an approximate formula that is only a good approximation when the change in lnwage is small, say < 0.1. If you were to use the exact formula, a coefficient of -.2 corresponds to about an 18% decrement in wage.

      As you can see, it is a bit complicated to see what is going on when reading the coefficients of an interaction regression. The -margins- command makes things much simpler. So if you run
      Code:
      regress lnwage i.gender##i.edu
      margins gender#edu
      will show you the expected values of lnwage in each combination of gender and edu. And you can also get the marginal effect of sex at each given edu level by running:
      Code:
      margins edu, dydx(gender)
      That is, you will see the difference between males at edu 1 and females at edu1, and the difference between males at edu2 and females at edu2.

      I recommend you read https://www3.nd.edu/~rwilliam/stats2/l53.pdf. It's a very clear and thorough explanation of interaction models, from the excellent Richard Williams.

      Comment


      • #4
        Hi Clyde,

        I still confused in terms of marginal effect and need your help please

        I am working on a Panel data model. I want to measure the marginal effect of the interaction terms FDX * INF, FDX * INFVOL , FDX2 * INF and FDX2 * INFVOL on GDP

        My model is multiplicative interaction models.
        GDP = β1 FDX + β2 FDX2+ β3 INF + β4 INFVOL+ β5 FDX * INF + β6 FDX * INFVOL + β7 FDX2 * INF + β8 FDX2 * INFVOL + β9 INIGDPPC + β10 GOV + β11 GFCF + β12 TRD + β13 LBOR

        by examining the partial derivative of GDP, as follows:

        ∂GDP/∂FDX = β1 + 2 β2 FDX + β5 INF + β6 INFVOL + 2 β7 FDX * INF + 2 β8 FDX * INFVOL

        I performed GMM command

        xtabond2 rgdpg ihs_inigdppc_lag1 fdxs2 fdxsquar2 ihs_inf c.fdxs2#c.ihs_inf c.fdxsquar2#c.ihs_inf ihs_gfcf ihs_gov ihs_trd ihs_lbor, gmm(rgdpg ihs_inigdppc fdxs2 fdxsquar2 c.fdxs2#c.ihs_inf c.fdxsquar2#c.ihs_inf , lag(2 2) collapse eq(diff)) iv(ihs_inf ihs_gfcf ihs_gov ihs_trd ihs_lbor, eq(diff)) gmm(rgdpg ihs_inigdppc fdxs2 fdxsquar2 c.fdxs2#c.ihs_inf c.fdxsquar2#c.ihs_inf, lag(2 .) collapse eq(level)) twostep robust

        I am trying to compute the standard error Using the covariance matrix, the variance

        σ^2(dy/dx) = Var ( β1) + 4 FDX2 var( β2) + INF2 var (β5) + INFVOL2 var (β6) +4 FDX2 * INF2 var( β7) + 4 FDX2 * INFVOL2 var (β8) + 4 FDX cov( β1β2) + 2INF cov(β1β5) + 2INFVOL cov(β1β6) + 4 FDX * INF cov(β2 β5) + 4 FDX * INFVOL cov(β2 β6) + 4 FDX * INF cov( β1β7) + 8 FDX2 * INF cov( β2 β7) + 4 FDX * INFVOL cov( β1β8) + 8 FDX2 * INFVOL cov( β2 β8) +4 FDX * INF2 cov (β5β7) +4 FDX * INFVOL2 cov (β6β8)

        1- Can I run this equation using marginal command, if yes, what is the command please?
        2- What the marginal command for interaction term say FDX * INF at Mean, Minimum and Maximum ?

        Thank you

        Badiah

        Comment


        • #5
          I want to measure the marginal effect of the interaction terms FDX * INF, FDX * INFVOL , FDX2 * INF and FDX2 * INFVOL on GDP
          Stop right there. There is no such thing as the marginal effect of an interaction term. As you have noted yourself, marginal effects are first order partial derivatives. Interaction terms are asociated with second order mixed partial derivatives.

          You would be able to get marginal effects for all these variables if you made full use of factor variable notation. Your command only makes partial use of it, and as a result, application of -margins- would give incorrect results.

          The way to revise your command is to get rid of your hand-calculated squared variables and instead rely on factor variable notation to emulate them in the regressions. So, eliminate terms like fdxsquar2, and replace them by fdxsquar#fdxsquar. Also, more generally, use the ## operator instead of the # operator to assure that all necessary subinteractions will be automatically included. If you want to represent an interaction between X and X2 and Y, do that as c.X##c.X##c.Y. Then you can get the average marginal effect of X with -margins, dydx(X)-, or the marginal effects of X conditional on specified values of Y with -margins, dydx(X) at(Y = (list of specific values of Y))-.

          Now, I am not a user of -xtabond2-, which is not an official Stata command. Your use of it suggests that it does support factor-variable notation, so I'm inferring that it does so fully, and will support this approach. If it does not, I'm afraid I can't offer you a workaround as I do not know much about its underlying mathematics.

          Comment

          Working...
          X