Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to Test Statistical Significance of Average Effect in Interaction Regression Model

    Hi everyone:

    I am running a multiple regression model with interactions between Female and some Xs and would like to calculate the average effect on Y of Female and test for its statistical significance. My regression model is of the form:

    lnY = f(Female, FullTime, YrsSenior, YrsJob, YrsPrior, admin, staff, Femadmin, Femstaff)

    where lnY = log Salary and Femadmin and Femstaff are interactions between Female and admin and staff.

    The MR results are as follows:

    . reg $salary Female $experience staff admin $feminter1, robust

    Linear regression Number of obs = 1,442
    F(9, 1432) = 323.99
    Prob > F = 0.0000
    R-squared = 0.6884
    Root MSE = .21134

    ------------------------------------------------------------------------------
    | Robust
    LogSalary | Coefficient std. err. t P>|t| [95% conf. interval]
    -------------+----------------------------------------------------------------
    Female | -.1264494 .0207647 -6.09 0.000 -.1671819 -.085717
    FullTime | .072019 .0136683 5.27 0.000 .0452069 .098831
    YrsSenior | .0157707 .0009773 16.14 0.000 .0138536 .0176877
    YrsJob | -.0014718 .0013556 -1.09 0.278 -.0041309 .0011874
    YrsPrior | .0032921 .0006148 5.36 0.000 .0020862 .004498
    staff | -.5281492 .0212004 -24.91 0.000 -.5697364 -.4865619
    admin | .232245 .0416696 5.57 0.000 .1505051 .3139849
    femadmin | -.036489 .0556346 -0.66 0.512 -.145623 .072645
    femstaff | .0594871 .0252234 2.36 0.018 .0100083 .108966
    _cons | 11.0827 .0225681 491.08 0.000 11.03843 11.12697
    ------------------------------------------------------------------------------

    . sum staff admin if Female == 1

    Variable | Obs Mean Std. dev. Min Max
    -------------+---------------------------------------------------------
    staff | 886 .5474041 .4980289 0 1
    admin | 886 .0665914 .2494539 0 1


    What I would like to do is calculate the average log salary pay gap for Females. I can calculate this by hand (i.e., substituting the means for Female for admin and staff) into the derivative from the regression equation:

    Avg Effect Female = -0.1264494 - (.036489)*(.0665914) + (.0594871)*(.5474041) = -0.09632

    But there has to be an automated way to do this, especially if I have a model with a lot of interaction terms. Copying and pasting reg coefs and means into Excel is a pain.

    I would also like to be able to test for whether the average Female effect of -0.09632 is statistically significant. The F-test for interactions will tell me if the interaction coefs are non-zero, but this is not the same as testing whether the average effect is non-zero. Any suggestions would be GREATLY appreciated! Thanks in advance.

    Rob Toutkoushian
    Professor
    University of Georgia



  • #2
    There is a better way. First, you have to go back and rewrite your regression command using factor variable notation. See -help fvvarlist- for information on how to do that: the details differ depending on whether the variables involved are continuous or discrete. Use the ## operator for the interactions. (## is explained in -help fvvarlist-.) Once you have rerun the regression that way, you can use the -margins- command. By the way, what you calculated by hand is not the average marginal effect, it is the marginal effect at the means. Both of those can be obtained from the -margins- command. The former would be -margins Female-, and the other would be -margins Female, atmeans()-.

    Comment


    • #3
      Thank you, this is very helpful. I figured that the margins command would be the way to do it. However, I'm having trouble getting it to work right in the interaction model.

      I went back and reformatted the regression model using interaction notation and obtained the same results as before:

      . reg LogSalary Female $experience admin staff Female##admin Female##staff, robust

      note: 1.Female omitted because of collinearity.
      note: 1.admin omitted because of collinearity.
      note: 1.staff omitted because of collinearity.

      Linear regression Number of obs = 1,442
      F(9, 1432) = 323.99
      Prob > F = 0.0000
      R-squared = 0.6884
      Root MSE = .21134

      ------------------------------------------------------------------------------
      | Robust
      LogSalary | Coefficient std. err. t P>|t| [95% conf. interval]
      -------------+----------------------------------------------------------------
      Female | -.1264494 .0207647 -6.09 0.000 -.1671819 -.085717
      FullTime | .072019 .0136683 5.27 0.000 .0452069 .098831
      YrsSenior | .0157707 .0009773 16.14 0.000 .0138536 .0176877
      YrsJob | -.0014718 .0013556 -1.09 0.278 -.0041309 .0011874
      YrsPrior | .0032921 .0006148 5.36 0.000 .0020862 .004498
      admin | .232245 .0416696 5.57 0.000 .1505051 .3139849
      staff | -.5281492 .0212004 -24.91 0.000 -.5697364 -.4865619
      1.Female | 0 (omitted)
      1.admin | 0 (omitted)
      |
      Female#admin |
      1 1 | -.036489 .0556346 -0.66 0.512 -.145623 .072645
      |
      1.staff | 0 (omitted)
      |
      Female#staff |
      1 1 | .0594871 .0252234 2.36 0.018 .0100083 .108966
      |
      _cons | 11.0827 .0225681 491.08 0.000 11.03843 11.12697
      ------------------------------------------------------------------------------

      From here, I want to use the margins command to calculate the change in lnY for Female evaluated at the means for females. But when I try:

      margins, dydx(Female)

      I get the following error message:
      . margins, dydx(Female)
      invalid dydx() option;
      variable Female may not be present in model as factor and continuous predictor
      r(111);

      . margins, dydx(*)
      invalid dydx() option;
      variable Female may not be present in model as factor and continuous predictor
      r(111);

      The other tricky part is that I want to evaluate the derivative at the means for only Females as opposed to the mean for both men and women combined.

      Any suggestions would be wonderful, and thanks again for your help.

      Rob

      Comment


      • #4
        The problem is the way you did the factor variable notation.

        Do it like this:

        Code:
        reg LogSalary $experience i.Female##i.(admin staff), robust
        In your code, you left Female, admin, and staff in as separate variable in addition to the Female##admin and Female##staff interactions. Now, Female##whatever gets expanded as i.Female i.whatever Female#whatever. (The i.'s are because by default a variable in an interaction is discrete unless you specify it as continuous with c..) Stata then sees that both Female and i.Female are in the model--but they are colinear, so one of them gets droppped. But the one that got dropped was the i.Female just because it was later in the model. (Stata usually chooses the variable closest to the end of the command to drop when it has to break a colinearity, although you cannot rely on this.) So now you are left with Female and Female#whatever. This gives rise to a contradiction because Female by itself with no specification is, by default, continuous; where as Female in Female#whatever without specification is, by default, discrete. So -margins- didn't know how to treat Female.

        If you do it the way I show you here, there will be no redundant variables so the entire problem goes away.

        Factor-variable notation takes as little getting used to, but once you're accustomed to it, it's one of Stata's best features.

        Comment


        • #5
          Yes, that did the trick!! Thank you so much.

          Rob

          Comment

          Working...
          X