simple effects after regression

Cameron Brick

Join Date: Jul 2014

Posts: 28
#1

simple effects after regression

30 Mar 2015, 14:53

Hi all, how do we test for simple effects in a 2-way interaction of continuous predictors (±1 SD) after [regress]? For example, test whether low-IV1-low-IV2 is different than low-IV1-high-IV2. I've been reading the forum, googling, and reading the manual but I'm still not having success. Welcome any feedback, thank you.
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#2

30 Mar 2015, 15:12

There may be a simple way to extract this from the margins command--certainly for the discrete variable case there would be. But this one isn't hard to do "by hand." So you have a model

Code:

Y = b0 + b1*V1 + b2*V2 + b12*V1*V2 (maybe + other terms) + error

Let's denote the low and high values of V2 you are focusing on by L2 and H2, respectively. And let's denote the low value of V1 by L1. So to contrast the expected value of outcome at (L1, L2) and (L1, H2), you just plug in the values into that equation and subtract, getting:

Code:

E(Y(L1, H2) - Y(L1, L2)) = b2*(H2-L2) + b12*L1*(H2-L2)

So to estimate that, along with the usual inferential statistics you can:

Code:

lincom (H2-L2)*_b[V2] + L1*(H2-L2)*_b(V1#V2)
Comment
Cameron Brick

Join Date: Jul 2014

Posts: 28
#3

30 Mar 2015, 15:52

Thank you, that manual approach makes clear sense. I'm not familiar with the e() function. Looks like it just stores a scalar, ok. What is the _b later? does that call that stored scalar in some way?

Maybe there is a more general case, but this code seems to only treat simple slopes and not simple effects: http://www.ats.ucla.edu/stat/stata/faq/conconb.htm (see bottom for code)... I've used it successfully for that.
Comment
Cameron Brick

Join Date: Jul 2014

Posts: 28
#4

30 Mar 2015, 15:55

Ok, I am reading the help file for underscore variables right now. Interesting.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#5

30 Mar 2015, 16:24

OK, I don't use the terminology "simple effects" and "simple slopes," so maybe we're not communicating. The code I gave you will calculate the expected difference between Y(L1, H2) ;and Y(L1, L2). If you want the difference in the slope of Y with respect to V2 at V2 = H2 and at V2 = H1, that is even simpler (and can be figured out with just a minor modification of the derivation I gave in #3):

Code:

lincom (H2-L2)*_b[V1#V2]
Comment
Cameron Brick

Join Date: Jul 2014

Posts: 28
#6

30 Mar 2015, 16:37

Sorry, wasn't trying to use specialized terminology. You provided exactly what I was asking for, the expected difference. Maybe it's also possible in a more general case, like in that link, I'm not sure.
Comment
Cameron Brick

Join Date: Jul 2014

Posts: 28
#7

30 Mar 2015, 20:30

Hi Clyde and others, I'm having trouble implementing the solution in #2. I'm still just beginner with the Stata code. Maybe I am not transforming it into my case properly?

Code:

E(Y(L1, H2) - Y(L1, L2)) = b2*(H2-L2) + b12*L1*(H2-L2)

If this is run, Stata doesn't recognize the E command. If changed to "e", then Stata tries to call "y" like a function. I started to think this wasn't code but was a model for the test. So I tried the next part. I apologize for any rookie moves here. Next,

Code:

lincom (H2-L2)*_b[V2] + L1*(H2-L2)*_b(V1#V2))

This yielded "H2 not found", which persisted even if I create variable H2. I don't understand this error. I went back to using the full variable names multiplied by the ±SD values, which then gave "unknown function _b()" even if I had just run xtreg, so I substituted the coefficients in manually:

Code:

lincom (1*V2-(-1*V2))*-.11 + (-1*V1)*-.02*(1*V2-(-1*V2))

and then the error reads: "not possible with test". Would love some additional guidance if anyone has time.

Last edited by Cameron Brick; 30 Mar 2015, 20:35.
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30122
#8

30 Mar 2015, 21:09

OK, I need to clear up some misunderstandings and typos.

The topmost block of code is not Stata code. It's just an equation meant to motivate the actual code in the two subsequent code blocks. I put it in a code block because I thought it would be easier to read that way. But it is not a Stata command at all, and I did not mean for you to attempt to run it. You should just run your regression model to start this off.

For the second block of code, H2 and L2 are supposed to be replaced by the numbers corresponding to the high and low values of V2 that you are interested in. You can either do that by directly typing in the numbers, if you know them, or if they have to be calculated from your data, by replacing them with local macros containing the corresponding calculated results. For example, at one point I think you said that H2 and L2 were respectively mean + 1 sd and mean - 1sd of the values of V2, and L1 was mean - 1 sd of V1. So in that case, the actual code would be

Code:

// RUN REGRESSION MODEL HERE regress Y c.V1##c.V2 and maybe other variables summarize V1 // OR MAYBE summarize V1 if e(sample) local L1 = r(mean) - r(sd) summarize V2 // OR MAYBE summarize V2 if e(sample) local H2 = r(mean) + r(sd) local L2 = r(mean) - r(sd) lincom (`H2'-`L2')*_b[V2] +`L1'*(`H2' - `L2')*_b[V1#V2]

Note: the final factor in the final term,_b[V1#V2], has square brackets, not parentheses: the parentheses were my typographical error. (Stata interpreted the parentheses _b(V1#V2) as an attempt to call a non-existent function b(), which accounts for the error message you got.) In using this code, pay scrupulous attention to all of the details of the punctuation.

Your third code block contains: lincom (1*V2-(-1*V2))*-.11 + (-1*V1)*-.02*(1*V2-(-1*V2)). I don't understand what you were trying to do here. It is not what you would get by just substituting the actual coefficients of V1 and V2 into the formula I gave as an illustration, and honestly, I can't figure out how you arrived at it. In any case, the code I have put in the code block in this post should work for you.
Comment
Cameron Brick

Join Date: Jul 2014

Posts: 28
#9

30 Mar 2015, 21:44

(revised)
Ok, I'll start on this right away. I just learned about macros and calling them, so that certainly helps. Thanks for your patience here. I am trying to be careful as you suggest. It looks like calling e(sample) makes sure we drop cases with missing data from the regression? That's good. In this case it's tiny, just .005%.

Here's where I'm at.

Code:

xtreg beh c.V1##c.V2 [covariates] sum V1 if e(sample) local L1 = r(mean) - r(sd)*1.5 // ±1.5 SD local H1 = r(mean) + r(sd)*1.5 sum V2 if e(sample) local L2 = r(mean) - r(sd) // ±1 SD local H2 = r(mean) + r(sd) lincom ((`H2' - `L2')*_b[V2]) + (`L1'*(`H2' - `L2')*_b[V1#V2]) // contrast at low x

I'm having a syntax error at the last line and I'm trying to track it down. Am I supposed to see a value when I call this?

Code:

di `H1'

Because that just displays nothing, blank. Is that why there's a syntax error? The two _b() functions display as expected when separately called. I think the error may be in these sections somehow.

Code:

(`H2' - `L2')

Last edited by Cameron Brick; 30 Mar 2015, 22:02.
Comment

Clyde Schechter

Join Date: Apr 2014
Posts: 30122

#10

31 Mar 2015, 08:28

I took the code you posted above, and ran it on the auto.dta data set, renaming some variables to correspond to beh V1 and V2. Obviously, this is not your real data, but I wanted to verify the code has no syntax errors. It runs just fine (Stata output pasted from Results window):

Code:

. sysuse auto, clear
(1978 Automobile Data)

. rename price beh

. rename mpg V1

. rename headroom V2

.
. xtset rep78
       panel variable:  rep78 (unbalanced)

// YOUR CODE PASTED FROM HERE ON OUT

.
. xtreg beh c.V1##c.V2 // [covariates] EXCEPT THAT I COMMENTED OUT THE COVARIATES HERE

Random-effects GLS regression                   Number of obs      =        69
Group variable: rep78                           Number of groups   =         5

R-sq:  within  = 0.2574                         Obs per group: min =         2
       between = 0.0175                                        avg =      13.8
       overall = 0.2165                                        max =        30

                                                Wald chi2(3)       =     17.96
corr(u_i, X)   = 0 (assumed)                    Prob > chi2        =    0.0004

------------------------------------------------------------------------------
         beh |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          V1 |  -114.4112   281.9207    -0.41   0.685    -666.9656    438.1431
          V2 |   632.8509   2014.159     0.31   0.753    -3314.828     4580.53
             |
   c.V1#c.V2 |  -45.56212   97.56147    -0.47   0.640    -236.7791    145.6549
             |
       _cons |   9503.518   6119.739     1.55   0.120     -2490.95    21497.99
-------------+----------------------------------------------------------------
     sigma_u |          0
     sigma_e |  2630.3386
         rho |          0   (fraction of variance due to u_i)
------------------------------------------------------------------------------

. sum V1 if e(sample)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          V1 |        69    21.28986    5.866408         12         41

. local L1 = r(mean) - r(sd)*1.5 // ±1.5 SD

. local H1 = r(mean) + r(sd)*1.5

.
. sum V2 if e(sample)

    Variable |       Obs        Mean    Std. Dev.       Min        Max
-------------+--------------------------------------------------------
          V2 |        69           3    .8531947        1.5          5

. local L2 = r(mean) - r(sd)     // ±1   SD

. local H2 = r(mean) + r(sd)

.
. lincom ((`H2' - `L2')*_b[V2]) + (`L1'*(`H2' - `L2')*_b[V1#V2]) // contrast at low x

 ( 1)  1.706389*V2 + 21.31322*c.V1#c.V2 = 0

------------------------------------------------------------------------------
         beh |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
         (1) |   108.8147   1463.076     0.07   0.941    -2758.761    2976.391
------------------------------------------------------------------------------

And, yes, -display `H1'- should give you back the high value of V1. Since it didn't, there are a couple of possibilities I can think of:

1. When you ran -sum V1-, did the output actually show a mean and standard deviation? If summarize encountered a problem and couldn't calculate those statistics (e.g. insufficient N) then r(mean) or r(sd) won't exist and `H1' will be empty. (By the way, in general, if you are running some code and it isn't doing what you expect, it helps to show us exactly the code you ran and exactly what Stata gave you by pasting from the Results window into a code block.

2. Did you run this code one or a few lines at a time instead of in one fell swoop? If, for example, you ran the -local H1 =...- command separately from the later code, the local H1 won't be there any more when you get to the later code. The scope of a local macro created when running a selection from a do-file ends when that selection finishes running. If you then attempt to run later lines from that do-file that call on that local macro, it's gone--what kind of error that produces depends on the context in which you try to use it. Here it would produce precisely the results you describe. Try running the code all at once. I understand the desire to run it one or two lines at a time to see what's going on--but you can't do that.

Comment

Cameron Brick

Join Date: Jul 2014
Posts: 28

#11

31 Mar 2015, 11:33

Ah. Amazing. It was failing because I ran the code piecemeal, exactly as you guessed. It's working now.

What a help you have been! I am grateful.

I learned during this thread: macros, local, underscore variables, lincom, pasting error code verbatim, and the better practice of specifying each model before coding it.

Hopefully I didn't create any errors here. I worked at it for a while and tried to make it cleaner.

Code:

* contrast effects (consider z-scored variables)
gen V1 =   // specify
gen V2 =   // specify
regress y c.V1##c.V2 // add covariates
margins, at(V1=(-1.5 0 1.5) V2=(-1 1)) // check margin distances  here and for -local- below
marginsplot

sum V1 if e(sample)
local L1 = r(mean) - r(sd)*1.5 // ±1.5 SD
local H1 = r(mean) + r(sd)*1.5

sum V2 if e(sample)
local L2 = r(mean) - r(sd)     // ±1   SD
local H2 = r(mean) + r(sd)

* contrast at L1 model: E(Y(L1, H2) - Y(L1, L2)) = b2*(H2-L2) + b12*L1*(H2-L2)
lincom (_b[V2]*(`H2' - `L2')) + (`L1'*_b[V1#V2]*(`H2' - `L2')) // contrast at L1
lincom (_b[V2]*(`H2' - `L2')) + (`H1'*_b[V1#V2]*(`H2' - `L2')) // contrast at H1
lincom (_b[V1]*(`H1' - `L1')) + (`L2'*_b[V1#V2]*(`H2' - `L2')) // contrast at L2
lincom (_b[V1]*(`H1' - `L1')) + (`H2'*_b[V1#V2]*(`H2' - `L2')) // contrast at H2

* slopes
lincom (`H1'-`L1')*_b[V1#V2] // slope of Y for V1
lincom (`H2'-`L2')*_b[V1#V2] // slope of Y for V2

Comment

Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#12

31 Mar 2015, 11:44

I would suggest you look more closely at the margins command. Margins will give you dy/dx for any values of the x variables. You just set up the dy/dx outputs you need in one margins command and then you can easily test for equality. If v1 and v2 are continuous, you need to pick specific values of v1 and v2.

I've generated y as
g y=var1 + var2 + var1*var2 + 10 * rnormal(78787)

The regression statement uses the factor notation. c. says that you have a continuous variable. A single # would just give var1 x var2 - you could do the same regression with reg y c.var1 c.var2 c.var1#c.var2

. reg y c.var1##c.var2

Source | SS df MS Number of obs = 6
-------------+------------------------------ F( 3, 2) = 115.80
Model | 19143.7776 3 6381.25921 Prob > F = 0.0086
Residual | 110.214557 2 55.1072786 R-squared = 0.9943
-------------+------------------------------ Adj R-squared = 0.9857
Total | 19253.9922 5 3850.79844 Root MSE = 7.4234

-------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
--------------+----------------------------------------------------------------
var1 | -6.388329 9.194944 -0.69 0.559 -45.95098 33.17432
var2 | -.2146348 1.261271 -0.17 0.881 -5.641446 5.212177
|
c.var1#c.var2 | 1.406546 .3656979 3.85 0.061 -.1669253 2.980017
|
_cons | 787888.1 33.14194 2.4e+04 0.000 787745.5 788030.7
-------------------------------------------------------------------------------

. margins, dydx(var1) at(var1=2 var2=30) at(var1=4 var2=40) post

Conditional marginal effects Number of obs = 6
Model VCE : OLS

Expression : Linear prediction, predict()
dy/dx w.r.t. : var1

1._at : var1 = 2
var2 = 30

2._at : var1 = 4
var2 = 40

------------------------------------------------------------------------------
| Delta-method
| dy/dx Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
var1 |
_at |
1 | 35.80805 2.587395 13.84 0.005 24.67539 46.94071
2 | 49.8735 5.851308 8.52 0.013 24.69736 75.04965
------------------------------------------------------------------------------

. test _b[1bn._at]=_b[2._at]

( 1) [var1]1bn._at - [var1]2._at = 0

F( 1, 2) = 14.79
Prob > F = 0.0614

If you want to see what the coefficient labels are, you run margins with post

margins, dydx(var1) at(var1=2 var2=30) at(var1=4 var2=40) post

and then do:
margins , coefl
Comment
Cameron Brick

Join Date: Jul 2014

Posts: 28
#13

31 Mar 2015, 14:36

Hi, thanks Phil for the comment. It's not for lack of trying with margins, I just don't understand the help file. Or your suggested code, my apologies for being a beginner.

1) Why is integration being used?
2) I thought I understood # and ##, so I'm not sure why you were discussing them. You went on to use the same method I did.
3) What are _at and 1bn. ?
4) I tried the code and it doesn't yield a result that overlaps with any of the ones from the previous method I posted, so I'm not sure what it's doing. I probably applied it wrong?

Code:

margins, dydx(V1) at(V1=-1.5 V2=-1) at(V1=-1.5 V2=1) post margins, coefl test _b[1bn._at]=_b[2._at]
Comment

Announcement