Four-Way Interaction for Dummy Variables

Simone Nuzzo

Join Date: May 2016

Posts: 17
#1

Four-Way Interaction for Dummy Variables

20 May 2016, 14:51

Dear All,

I am implementing a four-way interaction for dummy variables.

I do have four dummies: j, k, l, m. Each of them assumes values 0 or 1.

To build up the term j*k*l*m I insert four first order terms (j, k, l, m), 6 two way interaction terms (j*k, j*l, j*m, k*l, k*m, l*m) and 4 three way interaction terms (j*k*l, j*k*m, j*l*m, k*l*m).

The content of variables k, l, m only makes sense if variable j takes value 1. In other words, the cases included in variables k, l, m are subcases of j=1.

The problem is the following: let's take, for instance, the three way term k*l*m. It measures whether the impact of k*l on the dependent variable changes in the level of m when j is 0. But, when j is 0, the content of variables k, l, m makes no more sense, since the content of variables k, l, m does have reason to exist only when variable j takes value 1.

So, how to deal with this issue? How to interpret terms like k*l, k*m, l*m, k*l*m, i.e. all those cases in which it is assumed that j takes value 0?

Thank you so much!
Tags: None
Phil Bromiley

Join Date: Apr 2014

Posts: 4348
#2

20 May 2016, 16:06

You don't interpret one pair of interactions ignoring the others. You have a set of meaningful permutations of j, k, l, and m. If you use factor notation, then you can use margins to do predictions for each meaningful permutation of j, k, l, and m.
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#3

21 May 2016, 06:59

Simone:
welcome to the list.
I do share Phil's comment.
As an aside, three-way interactions are heavy stuff to disseminate (graphically or, even worse, by figures): I usually get fed up after two-way interactions (and my occasional audience loses connection even before).

Kind regards,
Carlo
(Stata 19.0)
Comment
Simone Nuzzo

Join Date: May 2016

Posts: 17
#4

03 Jun 2016, 10:57

Thank you very much for your kind replies!

Unfortunately, I am still a bit confused on some points. That's why I am starting from the main effects before moving to interactions!

In a more specific framework, I am running a panel data analysis where each unit (Subject) is studied over 63 periods (Period). I do have a continuous dependent variable y. I do only have four dummies (j, k, l, m, as pointed out in my previous post) as regressors. Since variables k, l, m just make sense only if variable j takes value 1, when j takes value 0 I use missing values for k, l, m in my dataset. Over 63 periods, in each cluster of seven periods, the market conditions (incorporated in the set of dummies) do not change (for instance, over the first 7 periods j is always 0 and k,l,m are missing values, over the second 7 periods j is always 1, k is always 0 or 1, l is always 0 or 1, m is always 0 or 1 and so on...).

I code:
"xtset Subject Period"
"xtreg y i.j i.k i.l i.m , fe" and I get j omitted because of collinearity.

How to estimate the effect on y when j moves from 0 to 1?

Thanks a lot
Simone
Comment
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17711
#5

03 Jun 2016, 11:45

Simone:
probably you can't with the current model specification.
At the top of that, please note that Stata applies listwise deletion whenever observations have missing values in any of the -depvar- or -indepvars-.
As a closing-out remark, you would be better off with posting what you typed and what Stata gave you back (as per FAQ #12): it worths more that tons of lines devoted to describing the problem the poster stumbled upon.

Kind regards,
Carlo
(Stata 19.0)
Comment
Clyde Schechter

Join Date: Apr 2014

Posts: 30101
#6

03 Jun 2016, 11:56

As Carlo rightly points out, by specifying k, l, and m with missing values when j = 0, you are excluding all j = 0 observations from your regression.

I think it is a mistake to try to specify j as a dichotomy here. Since k, l, and m are only meaningful when j = 1, it seems that k, l, and m are not really separate variables, but rather they represent additional levles within j. Perhaps you should not have variables k, l, and m, at all, and j should run from 0 through 9.

Alternatively, you might want to have two separate models, one with only j as a predictor, and then another, applied only to the j = 1 subset of your data, with only k, l, and m as predictors.

Another possibility is that it might make sense to set k, l, and m to zero when j = 0. Whether it makes sense to do this depends on the actual meaning of j, k, l, and m. Sometimes if something is undefined for j = 0 it is legitimate to also say that it is "absent." Sometimes it isn't. It depends on the meanings of the variables.
Comment
Simone Nuzzo

Join Date: May 2016

Posts: 17
#7

04 Jun 2016, 11:06

Thanks for your replies!

Clyde:
the first two options you propose (j running from 0 through 9 or having two separate models) look very powerful. The possibility of having k, l, m set to zero when j=0 is not applicable to my specific framework. Indeed j=0 means that the market is "not taxed" and j=1 that the market is "taxed". k, l, m stand for different types of tax (i.e. low v.s. high). So, k, l, m can be only thought as a subset of the case j=1.
Probably, the option of having a specific model applied to the j=1 subset of my data will also allow me to better care for interaction effects with respect to the first option of having j over 8 levels (to specify the 9 variations).

I will let you know!

Best Regards
Simone
Comment

Announcement

Four-Way Interaction for Dummy Variables

Comment

Comment

Comment

Comment

Comment

Comment