A question of interaction terms with indicator variables (dummy variables)

Yiting Cao

Join Date: Jan 2016

Posts: 23
#1

A question of interaction terms with indicator variables (dummy variables)

02 Apr 2016, 16:35

Hi everyone,

I'm running a simple regression with earnings as my variable of interest. I want to emphasize that the coefficients are different when earnings are positive and negative. The model is like this:

Y=a_1+ a_2NI+ a_3NI×Neg+a_4Neg+CONTROLs (1)
NI represents net income, Neg is an indicator variable that =1 if NI<0 and 0 otherwise

An interesting thing is that I saw a paper with the similar question did a regression test like following:

Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+CONTROLs (2)
Positive_NI=NI if NI>0, and 0 otherwise
Negative_NI=NI if NI<0, and 0 otherwise
Neg is still the indicator variable of loss.

Even though I understand that I can solve my question with Eq. (1), I'm very curious about whether Eq. (2) is statistically correct (Eq.2 can help emphasize my hypothesis that b_2 is negative but b_3 is positive).

Many thanks,
Yiting
Tags: None
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#2

02 Apr 2016, 16:57

The two models are completely equivalent. The two models are just algebraic linear transforms of each other, different ways of parameterizing the same model. Use whichever is more convenient for you. All conclusions you draw will be identical.
1 like
Comment
Yiting Cao

Join Date: Jan 2016

Posts: 23
#3

05 Apr 2016, 13:06

Hi Clyde,

Many thanks! I just confirmed that the conclusions are identical!

Best,
Yiting
Comment
Yiting Cao

Join Date: Jan 2016

Posts: 23
#4

05 Apr 2016, 15:36

Originally posted by Clyde Schechter View Post

The two models are completely equivalent. The two models are just algebraic linear transforms of each other, different ways of parameterizing the same model. Use whichever is more convenient for you. All conclusions you draw will be identical.

I have a following question, what if I have an additional indicator variable (IND) to interact:
Thus I think the first equation becomes:
Y=a_1+ a_2NI+ a_3NI×Neg+a_4NI×Neg×IND+a_5Neg+a_6Neg×IND+ a_7NI×IND +a_8IND+CONTROLs (1)

Then how about the second one?
Should it be:
Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+CONTROLs (2)

or

Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+b_8Neg×IND+ b_9NI×IND+CONTROLs (3)
?

The question is do I still need to include the interaction between IND and Neg and IND and NI.

Thanks again!

Best,
Yiting
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#5

05 Apr 2016, 17:07

Yes, you still need to include the two way interactions, so equation (3) would be the correct model to correspond to #1.

Even apart from your particular problem, the general rule is that wherever you have a#b#c you also need a, b, c, a#b, b#c, and a#c. There are exceptions, but when you're building models, you should start by presuming you need all of those terms, and only eliminate them if there is a compelling justification for doing so.
Comment
Yiting Cao

Join Date: Jan 2016

Posts: 23
#6

07 Apr 2016, 12:07

Hi Clyde,

Got it! Many thanks!

best,
Yiting
Comment
Yiting Cao

Join Date: Jan 2016

Posts: 23
#7

18 Apr 2016, 10:01

Originally posted by Yiting Cao View Post

I have a following question, what if I have an additional indicator variable (IND) to interact:
Thus I think the first equation becomes:
Y=a_1+ a_2NI+ a_3NI×Neg+a_4NI×Neg×IND+a_5Neg+a_6Neg×IND+ a_7NI×IND +a_8IND+CONTROLs (1)

Then how about the second one?
Should it be:
Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+CONTROLs (2)

or

Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+b_8Neg×IND+ b_9NI×IND+CONTROLs (3)
?

The question is do I still need to include the interaction between IND and Neg and IND and NI.

Thanks again!

Best,
Yiting

Hi everyone,

I think the my Eq (3) is wrong. The correct version is
Y=b_1+ b_2Positive_NI+ b_3Negative_NI+b_4Neg+ b_5IND +b_6Positive_NI×IND+ b_7Negative_NI×IND+b_8Neg×IND+CONTROLs (4)

b_9NI×IND shouldn't be included since NI is colinear with Positive_NI and Negative_NI.

Positive_NI=NI if NI>0, and 0 otherwise
Negative_NI=NI if NI<0, and 0 otherwise
Neg is still the indicator variable of loss.

Clyde, please let me know if you I think I'm wrong.

Best,
Yiting
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#8

18 Apr 2016, 10:32

Yes, it looks like you are right.

You know, you don't need to do it this way. Basically you have 3 variables, Positive_NI, Negative_NI, and Neg, and then you have a variable IND, that you want to include along with its interactions with all of the others. So instead of risking making mistakes, you can let Stata do this for you automatically, by using factor-variable notation:

Code:

regression_command Y i.IND##(c.Positive_NI c.Negative_NI i.neg) // AND OTHER COVARIATES, OPTIONS AS APPROPRIATE

Stata will generate all the appropriate combinations you need. If any of them turn out to be colinear, Stata will automatically omit something during the estimation.
Comment
Yiting Cao

Join Date: Jan 2016

Posts: 23
#9

18 Apr 2016, 20:34

Many thanks Clyde! I just want to make sure that no one will be confused by my posts .

Actually, in my real test I have two indicator variables besides Positive_NI, Negative_NI, and Neg: (i.e. IND_treat and IND_post), which makes the case more complicated. It's more like a 3-ways interaction (actually 4-ways).
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30119
#10

18 Apr 2016, 22:19

If you are going up to three and four way interactions, I think the case for using factor-variable notation to get Stata to generate all the terms is even more compelling. Also, by using factor-variable notation, you will be able to use the -margins- command after estimation, which will greatly simplify figuring out predicted values in various combinations of your variables, and marginal effects.
Comment
Yiting Cao

Join Date: Jan 2016

Posts: 23
#11

19 Apr 2016, 14:15

Hi Clyde,

Thanks for your comments! I used factor-variable notation. However I can't find a way to compare coefficients when using factor-variable notation (e.g. test i.IND#posini=i.IND#negni)

Please let me know if I miss anything!

Best,
Yiting
Comment
Richard Williams

Join Date: Apr 2014

Posts: 5008
#12

19 Apr 2016, 14:26

Use the -coefl- option when running (or replaying) your estimation command. It tells you the names of parameters which makes it easier to set up the tests you want.

Testparm can also be handy, e.g. if i.relig is in your estimation command then -testparm i.relig- can test whether all the associated coefficients significantly differ from zero.

-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
StataNow Version: 19.5 MP (2 processor)
EMAIL: [email protected]
WWW: https://www3.nd.edu/~rwilliam
Comment
Yiting Cao

Join Date: Jan 2016

Posts: 23
#13

19 Apr 2016, 15:01

Got it!! Many thanks Richard!!

Best,
Yiting
Comment

Announcement

A question of interaction terms with indicator variables (dummy variables)

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment