Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • How to do non-linear interaction in a linear model?

    Dear All,

    Hi. I am aware that my title is probably confusing. I will explain my question in detail:

    I would like to analyze how the effect of a continuous treatment, T, is conditional on a continuous covariate, X. The outcome variable is Y. I learned the most typical method to estimate a model with an interaction term, XT:

    Code:
    reg Y X T XT
    I think that this model estimates how the effect of T on Y is linearly dependent on X. The coefficient of XT implies how one unit change in X affects the effect of one unit change in T.

    But I am wondering if this is a reasonable model. I speculate that the effect of T on Y is conditional on X in a non-linear manner. Specifically, I hypothesize that the effect of T on X will be significantly larger only when a subject have a value of X that is larger than 75 percentile. So I make a dummy variable, D, for the 75 percentile of X:

    Code:
    xtile temp = X, n(4)
    table temp, gen(g)
    rename g4 D
    Thus D is an indicator for whether a subject have X larger than 75 percentile. D denotes a categorical effect of X.

    However, I am confused about how I should make the interaction term. My question is that when I interact D with X, should I include X or main effect of D? That is, which one of the following model should I use?

    Code:
    //model 1
    reg Y T D DT
    //model 2
    reg Y T D DT X
    //model 3
    reg Y T X DT
    The first Model assumes that there is no linear effect of X on Y. I do not think that this is desirable. I still believe that X has a linear effect.
    The second model seems to be the mode typical interaction model. I basic lesson I learned is that when I include an interaction term (DT), I have to include the main effect of both variables. However, I think that the what the second model measures is somewhat awkward. The second model measures the linear effect of X on Y, and by including main effect of D, it also measures whether X has a categorical effect on Y (D is a indicator of more than 75 percentile X). I feel that it somewhat resembles RDD, and I did not really want to assume that there is such discontinuity in the effect of X.
    I think that the third model reflects what I want to measure or what I want to assume. The third model keeps a linear effect of X on Y, and it also measures how the effect of T is contingent on the 75 percentile threshold of X (which is D). Unlike the second model, the third model did not assume any discontinuity in the effect of X. Thus I prefer model 3.

    What confuses me is whether it is legitimate to do Model 3. Or the problem is that I do not know if it is legitimate to have an interaction effect without main effect in the model. I am wondering if Model 3 has some implicit problems.

    Thank you very much for your advice.

  • #2
    There is nothing stopping you interacting X with nonlinear functions of T. For example
    Code:
    gen lT = ln(T)
    reg Y X T c.X#c.lT
    Or using a quadratic interaction with T
    Code:
    reg Y X T c.X#(c.T c.T#c.T)
    I hope this helps.
    Alfonso Sanchez-Penalver

    Comment

    Working...
    X