Different results from manual interaction term and # command

Wendy Zeng

Join Date: Jun 2020

Posts: 1
#1

Different results from manual interaction term and # command

05 Jun 2020, 17:05

Hello,

I'm getting different coefficients when I manually create an interaction term versus when I use the # command on stata, and I am not sure why.
I replicated a simple version of my code using auto.dta for ease of replicability. Thanks for any help in advance!

Code:
sysuse auto.dta, clear
gen dummy = headroom<3
gen dummyinteract = dummy*headroom
reg price dummy dummy#c.headroom
reg price dummy dummyinteract

Edit: Should add, I'm using Stata 16.1 on a firewalled server

Last edited by Wendy Zeng; 05 Jun 2020, 17:10.
Tags: None
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#2

06 Jun 2020, 00:44

Wendy:
welcome to this forum.
Your code interact a variable with a part of the same variable.
More helpful replies are conditional on posting what Stata gave you back, top (as per FAQ).

Kind regards,
Carlo
(Stata 19.0)
Comment

Scott Merryman

Join Date: Mar 2014
Posts: 895

06 Jun 2020, 17:14

Because the two specifications are not the same. The first with the factor variable notation includes both levels of the dummy variable. In the manual specification you have only included one of the levels of the factor variable. In the example below I labels the values of the dummy variables as A & B. In the full model specification you need to include both when A is true and when B in true. I think part of the difficultly understanding is not including the main effects ("headroom" as a variable in the model)..

Code:

sysuse auto.dta, clear
gen dummy = (headroom<3) 
label define lab 0 "A" 1 "B"
label values dumm lab 
reg price i.dummy  i.dummy#c.headroom, noheader
//Interaction coefficients are just the slopes at 
// different levels of the dummy variable
margins dummy, dydx(headroom)
//This is made clearer when the main effects are included
reg price i.dummy##c.headroom, noheader

//Manual specification
gen dummyinteract = dummy*headroom

tab dummy, gen(D)
gen D1interact = D1*headroom 
gen D2interact = D2*headroom
//Full model
reg price D2 D1interact D2interact
//Original Model
reg price D2  D2interact, noheader
reg price dummy  dummyinteract, noheader

Comment

Pia Andres

Join Date: Jan 2022

Posts: 2
#4

19 Jan 2022, 15:13

Hello, I am dealing with a similar issue and have come across this thread - but I think the response given above does not apply to me.

I am running a regression using ppmlhdfe with two dummy variables and the interaction between them. This is constructed as follows:

Code:

gen interaction = D1*D2 ppmlhdfe y D1#D2 control i.year, vce(robust) ppmlhdfe y D1 interaction D2 control i.year, vce(robust)

I ran this comparison mostly to see if the results are the same, as the way esttab outputs and labels the first version is kind of ugly and confusing. However, while the coefficients on D1 and D2 in the second version match those of D1 = 1, D2 = 0 and D1 = 0, D2 = 1 in the first version, the interaction term is completely different - wrong sign, wrong magnitude, significant in the first version but insignificant in the second. The coefficient on D1 = 0, D2 = 0 which is explicitly outputted in the first version is omitted due to collinearity, so I feel the results really should be identical.

I have re-run this using the reg command to make sure it's not a ppml issue, but the same thing happened. I have also tried adding the dummies and interaction as explicit factor variables:

Code:

ppmlhdfe y i.D1 i.interaction i.D2 control i.year, vce(robust)

but the outcome did not change.

Although my case is a bit different (as it uses two dummy variables, and I already included both dummies individually) I tried to apply Scott's response anyway, generating both levels of the first dummy variable and interacting both with the other dummy, as follows:

Code:

tab D1, gen(d) gen d1D2 = d1*D2 gen d2D2 = d2*D2 ppmlhdfe y D1 D2 d1D2 d2D2 control i.year, vce(robust)

but what happens is that d2D2 is omitted because of collinearity - not surprisingly - and the results are the same. Does anyone have any clues as to why this is?
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#5

20 Jan 2022, 01:24

Pia:
welcome to this forum.
What if:

Code:

gen interaction = D1*D2 ppmlhdfe y D1##D2 control i.year, vce(robust) ppmlhdfe y D1 interaction D2 control i.year, vce(robust)

?

Kind regards,
Carlo
(Stata 19.0)
Comment
Nick Cox

Join Date: Mar 2014

Posts: 35699
#6

20 Jan 2022, 01:48

#4 is cross-posted at https://stackoverflow.com/questions/...cted-variables

You are asked to tell us about cross-posting. FAQ Advice #8 https://www.statalist.org/forums/help#crossposting
1 like
Comment
Jeff Wooldridge

Join Date: Apr 2014

Posts: 2168
#7

20 Jan 2022, 05:49

Pia: As is noted in the FAQ, you're likely to get better answers if you show us what Stata actually produces when you type your commands.

Frankly, when I use i.D1#i.D2 I find the output confusing because it tries to show the three combinations of zero and one relative to the base group where D1 = D2 = 0. And it usually shows D1 = 0, D2 = 1 -- not want you want. The command where you manually compute the interaction gives the correct answer. Also, using c.D1#c.D2 as the interaction will also provide the correct estimates.

Carlo's suggestion of D1##D2 also produces the correct estimates.
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#8

20 Jan 2022, 05:59

When I feel lost with interactions (and as Jeff reported, this hits me especially when interactions include categorical variables with >2 levels each), I usually add the -allbaselevels- option to get a better awareness of what their coefficients are trying to tell me.

Kind regards,
Carlo
(Stata 19.0)
Comment
Pia Andres

Join Date: Jan 2022

Posts: 2
#9

20 Jan 2022, 10:03

Apologies Nick for not mentioning the Stack overflow post in my comment - I have edited my post on Stack overflow accordingly.

Thank you very much Carlo and Jeff for your advice! I have replicated the issue as follows:

Code:

sysuse auto.dta, clear gen high_price = 0 replace high_price = 1 if price>6165 gen interaction = high_price*foreign ppmlhdfe trunk high_price interaction foreign headroom, vce(robust) ppmlhdfe trunk high_price#foreign headroom, vce(robust) ppmlhdfe trunk high_price##foreign headroom, vce(robust) ppmlhdfe trunk high_price c.high_price#c.foreign foreign headroom, vce(robust)

As is the case with my original data, the version using # yields a different result, but the manual interaction and the version with ## or c.high_price#c.foreign result in the same output. This is good to know - I will stay clear of the # in the future!
1 like
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#10

20 Jan 2022, 10:38

Pia:
it is expected that:

Code:

ppmlhdfe trunk high_price#foreign headroom, vce(robust)

gives back different results when compared to:

Code:

ppmlhdfe trunk high_price##foreign headroom, vce(robust)

as in the first code the main conditional effect of the two terms included in the interaction cannot be calculated,as you omitted both -high_price- and -foreign- outside the interaction.

That said:

Code:

ppmlhdfe trunk high_price##foreign headroom, vce(robust)

gives back the same results as:

Code:

ppmlhdfe trunk high_price interaction foreignheadroom, vce(robust

because in codes #3 and #4 you included -high_price- and -foreign- plus their interaction (please note that, unlike code #3, code #4 can't allow you to exploit the virtuous relationship of -fvvarlist- with -margins- and -marginsplot-).

Kind regards,
Carlo
(Stata 19.0)
Comment
Yusuf Ceylan

Join Date: Apr 2022

Posts: 4
#11

14 May 2022, 08:51

Dear All,

I am estimating a structural gravity model. I am estimating the average impact of FTA indicator but having issues once I interact my FTA dummy with specific provisions dummies mentioned in the agreements (such as IPR provision).

The way I interact the two is :

ppmlhdfe total_trade_flow agree_fta#IPR, a(imp_time exp_time first#second) cluster(pair_id)

Note that once I separately include agree_fta and IPR indicators into the above equation in addition the the interaction term, this drops my interaction terms because of collinearity. I have pasted the results using agree_fta#IPR interaction term. Note that the existence of a provision depends on the existence of FTAs. Therefore, the first row is empty.

HDFE PPML regression No. of obs = 69,942
Absorbing 3 HDFE groups Residual df = 11,135
Statistics robust to heteroskedasticity Wald chi2(2) = 9.70
Deviance = 127489763.8 Prob > chi2 = 0.0078
Log pseudolikelihood = -63880220.51 Pseudo R2 = 0.9372

Number of clusters (pair_id)= 11,136
(Std. err. adjusted for 11,136 clusters in pair_id)

Robust
total_trade~w Coefficient std. err. z P>z [95% conf. interval]

agree_fta#IPR
0 1 0 (empty)
1 0 .4870974 .2696324 1.81 0.071 -.0413725 1.015567
1 1 .204006 .0757856 2.69 0.007 .0554689 .352543

_cons 12.217 .0321301 380.24 0.000 12.15403 12.27997

I have finally came across your comments here and wanted to make sure if I am doing it correct and checked the below equations:

ppmlhdfe total_trade_flow agree_fta c.agree_fta#c.IPR IPR, a(imp_time exp_time first#second) cluster(pair_id)

Now, the IPR provision has been dropped due to collinearity and I received the below estimation:

HDFE PPML regression No. of obs = 69,942
Absorbing 3 HDFE groups Residual df = 11,135
Statistics robust to heteroskedasticity Wald chi2(2) = 9.70
Deviance = 127489763.8 Prob > chi2 = 0.0078
Log pseudolikelihood = -63880220.51 Pseudo R2 = 0.9372

Number of clusters (pair_id)= 11,136
(Std. err. adjusted for 11,136 clusters in pair_id)
-----------------------------------------------------------------------------------
| Robust
total_trade_flow | Coefficient std. err. z P>|z| [95% conf. interval]
------------------+----------------------------------------------------------------
agree_fta | .4870974 .2696324 1.81 0.071 -.0413725 1.015567
|
c.agree_fta#c.IPR | -.2830914 .2733026 -1.04 0.300 -.8187546 .2525718
|
IPR | 0 (omitted)
_cons | 12.217 .0321301 380.24 0.000 12.15403 12.27997
-----------------------------------------------------------------------------------

The interaction term becomes negative but statistically insignificant. However, if I include only the interaction term and remove agree_fta and IPR variables, then the results give me significant and positive estimation.

ppmlhdfe total_trade_flow c.agree_fta#c.IPR, a(imp_time exp_time first#second) cluster(pair_id)

HDFE PPML regression No. of obs = 69,942
Absorbing 3 HDFE groups Residual df = 11,135
Statistics robust to heteroskedasticity Wald chi2(1) = 5.18
Deviance = 127601348.1 Prob > chi2 = 0.0229
Log pseudolikelihood = -63936012.66 Pseudo R2 = 0.9371

Number of clusters (pair_id)= 11,136
(Std. err. adjusted for 11,136 clusters in pair_id)
-----------------------------------------------------------------------------------
| Robust
total_trade_flow | Coefficient std. err. z P>|z| [95% conf. interval]
------------------+----------------------------------------------------------------
c.agree_fta#c.IPR | .1688964 .0742208 2.28 0.023 .0234264 .3143665
|
_cons | 12.27553 .0064697 1897.39 0.000 12.26285 12.28821
-----------------------------------------------------------------------------------

I am trying to see whether FTAs including IPR provision has stronger effects on trade or not. As I said, if I include only c.agree_fta#c.IPR instead of agree_fta c.agree_fta#c.IPR IPR into the equation, I receive very different results and I would like to make sure whether I can use any of these three.

If you can help me, I would really appreciate it.

Best regards
Yusuf
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#12

14 May 2022, 10:59

Yusuf:
Joao Santos Silva is the guru for this (and many more) topic(s).
Take a look at his previous posts.
As an aside, please use CODE delimiters when posting what you typed and what Stata gave you back. Thanks.

Kind regards,
Carlo
(Stata 19.0)
Comment
Joao Santos Silva

Join Date: Apr 2014

Posts: 3011
#13

15 May 2022, 05:22

Dear Yusuf Ceylan,

Since the IPR variable can only be 1 when there is an FTA, it is already an interaction and therefore you should not add the additional interaction between IPR and FTA.

Best wishes,

Joao
Comment

Announcement