Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interaction versus stratification - when adding covariates, is it better to have a fully-interacted model with covariate interactions?

    Dear Statalist,

    I have a question about interactions. Say I am interested in comparing coefficients across sub-groups (by gender), and say I am interested in doing this using an interaction model for exposure-by-gender and testing the significance of the interaction coefficients.

    But if I have covariates in the model, do I need to interact my effect modifier (gender) with ALL possible variables in the model? I understand that this "fully-interacted" model will generate the same coefficients as in the stratified model.

    Code:
    y = exposure + gender + exposure*gender + covariate + covariate*gender
    
    with the gender-specific exposure coefficients equal to the stratified model:
    y = exposure + covariate if gender=0
    y = exposure + covariate if gender=1
    Whereas if I do not interact these covariates with the effect modifer, then the gender-specific exposure estimates will not match the exposure estimates from the gender-stratified model.

    Code:
    y = exposure + gender + exposure*gender + covariate
    
    with the gender-specific exposure coefficients NOT equal to the stratified model:
    y = exposure + covariate if gender=0
    y = exposure + covariate if gender=1
    My research question of interest is focused on the exposure-outcome relationship and how it varies by gender, with the covariates serving only as control variables. So would it be better to run only the exposure-by-gender interaction and leave the covariates to the "pooled" sample (without covariate-by-gender interaction)?

    A related question is for difference-in-difference methods, which are basically a treatment-by-time interaction. If we have covariates in a DiD model, do we include time interacted with all possible covariates? Or just interacted with the treatment variable?
    Code:
    y = treatment + post + treatment*post + covariate + covariate*post
    
    versus
    
    y = treatment + post + treatment*post + covariate

    Sample code in Stata:

    Code:
    sysuse auto, clear
    
    *stratified model 1 focusing on length -> price relationship within foreign==0
    regress price length weight i.rep78 if foreign==0
    lincom _b[length]
    
    *stratified model 2 focusing on length -> price relationship within foreign==1
    regress price length weight i.rep78 if foreign==1
    lincom _b[length]
    
    *fully-interacted model, length -> price estimates (within levels of foreign) equivalent to stratified models above
    quietly regress price c.length##i.foreign c.weight##i.foreign i.rep78##i.foreign
    lincom _b[length]
    lincom _b[length]+_b[1.foreign#c.length]
    
    *but if only interacting the exposure variable of interest, and not covariates, then length -> price estimates (within levels of foreign) will not be equal to stratified models above
    quietly regress price c.length##i.foreign c.weight i.rep78
    lincom _b[length]
    lincom _b[length]+_b[1.foreign#c.length]
    Last edited by Jenny Williams; 16 Apr 2018, 20:29.

  • #2
    There is no one-size-fits-all answer. There are two aspects of this problem to consider.

    The first is whether or not you expect that the effects of the covariates actually differ by gender. If so, then you have to include the covariate#gender interactions to capture that in your model. But if the effects of the covariates are not expected to differ appreciably by gender, then there is no reason to include the gender#covariate interaactions. I wish to emphasize here that I am referring to the marginal effects of the covariates, not the actual values of the covariates, differing by gender. This is the primary consideration. I should add that it is possible that on this basis you will choose to include some, but not all, of the gender#covariate interactions.

    A secondary consideration is that if there are enough covariates (and if you have covariates that are polychotomous, for these purposes, an n-level categorical variable counts as n-1 covariates) then including all the gender#covariate interactions may just introduce too many variables into the model leaving you with little statistical power for your hypothesis tests and massive overfitting of the data. Ideally, your sample size is adequate to accommodate all of this--but in practice this is not always the case.

    Unsolicited advice: to get the gender-specific marginal effects, rather than using -lincom, I recommend using:

    Code:
    margins foreign, dydx(length)
    It is simpler and shorter to type, and you are protected from making any algebraic mistakes. -lincom- is simple enough to use in the case of a simple dichotomy#continuous interaction, but with more complicated models it is very easy to mess up. -margins- makes it easy and foolproof.

    Comment

    Working...
    X