Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Four-Way Interaction for Dummy Variables

    Dear All,

    I am implementing a four-way interaction for dummy variables.

    I do have four dummies: j, k, l, m. Each of them assumes values 0 or 1.

    To build up the term j*k*l*m I insert four first order terms (j, k, l, m), 6 two way interaction terms (j*k, j*l, j*m, k*l, k*m, l*m) and 4 three way interaction terms (j*k*l, j*k*m, j*l*m, k*l*m).

    The content of variables k, l, m only makes sense if variable j takes value 1. In other words, the cases included in variables k, l, m are subcases of j=1.

    The problem is the following: let's take, for instance, the three way term k*l*m. It measures whether the impact of k*l on the dependent variable changes in the level of m when j is 0. But, when j is 0, the content of variables k, l, m makes no more sense, since the content of variables k, l, m does have reason to exist only when variable j takes value 1.

    So, how to deal with this issue? How to interpret terms like k*l, k*m, l*m, k*l*m, i.e. all those cases in which it is assumed that j takes value 0?

    Thank you so much!

  • #2
    You don't interpret one pair of interactions ignoring the others. You have a set of meaningful permutations of j, k, l, and m. If you use factor notation, then you can use margins to do predictions for each meaningful permutation of j, k, l, and m.

    Comment


    • #3
      Simone:
      welcome to the list.
      I do share Phil's comment.
      As an aside, three-way interactions are heavy stuff to disseminate (graphically or, even worse, by figures): I usually get fed up after two-way interactions (and my occasional audience loses connection even before).
      Kind regards,
      Carlo
      (Stata 19.0)

      Comment


      • #4
        Thank you very much for your kind replies!

        Unfortunately, I am still a bit confused on some points. That's why I am starting from the main effects before moving to interactions!

        In a more specific framework, I am running a panel data analysis where each unit (Subject) is studied over 63 periods (Period). I do have a continuous dependent variable y. I do only have four dummies (j, k, l, m, as pointed out in my previous post) as regressors. Since variables k, l, m just make sense only if variable j takes value 1, when j takes value 0 I use missing values for k, l, m in my dataset. Over 63 periods, in each cluster of seven periods, the market conditions (incorporated in the set of dummies) do not change (for instance, over the first 7 periods j is always 0 and k,l,m are missing values, over the second 7 periods j is always 1, k is always 0 or 1, l is always 0 or 1, m is always 0 or 1 and so on...).

        I code:
        "xtset Subject Period"
        "xtreg y i.j i.k i.l i.m , fe" and I get j omitted because of collinearity.

        How to estimate the effect on y when j moves from 0 to 1?

        Thanks a lot
        ​Simone

        Comment


        • #5
          Simone:
          probably you can't with the current model specification.
          At the top of that, please note that Stata applies listwise deletion whenever observations have missing values in any of the -depvar- or -indepvars-.
          As a closing-out remark, you would be better off with posting what you typed and what Stata gave you back (as per FAQ #12): it worths more that tons of lines devoted to describing the problem the poster stumbled upon.
          Kind regards,
          Carlo
          (Stata 19.0)

          Comment


          • #6
            As Carlo rightly points out, by specifying k, l, and m with missing values when j = 0, you are excluding all j = 0 observations from your regression.

            I think it is a mistake to try to specify j as a dichotomy here. Since k, l, and m are only meaningful when j = 1, it seems that k, l, and m are not really separate variables, but rather they represent additional levles within j. Perhaps you should not have variables k, l, and m, at all, and j should run from 0 through 9.

            Alternatively, you might want to have two separate models, one with only j as a predictor, and then another, applied only to the j = 1 subset of your data, with only k, l, and m as predictors.

            Another possibility is that it might make sense to set k, l, and m to zero when j = 0. Whether it makes sense to do this depends on the actual meaning of j, k, l, and m. Sometimes if something is undefined for j = 0 it is legitimate to also say that it is "absent." Sometimes it isn't. It depends on the meanings of the variables.

            Comment


            • #7
              Thanks for your replies!

              Clyde:
              the first two options you propose (j running from 0 through 9 or having two separate models) look very powerful. The possibility of having k, l, m set to zero when j=0 is not applicable to my specific framework. Indeed j=0 means that the market is "not taxed" and j=1 that the market is "taxed". k, l, m stand for different types of tax (i.e. low v.s. high). So, k, l, m can be only thought as a subset of the case j=1.
              Probably, the option of having a specific model applied to the j=1 subset of my data will also allow me to better care for interaction effects with respect to the first option of having j over 8 levels (to specify the 9 variations).

              I will let you know!

              Best Regards
              Simone



              Comment

              Working...
              X