Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-way interactions in Repeated Measures


    Dear Statalist user,
    I am working with a pretest-posttest design data where 1000 individuals were surveyed before an experiment, and then half of them were assigned a treatment (genetic testing), and all of them were surveyed again after the experiment (about a year after the experiment).
    We are trying to see first whether individual's opinions about race changed. The hypothesis is that the treatment group will change while the control group will not.
    Secondly we want to see if this change is dependent on the Knowledge of Biology or not. (measured with a categorical variable)
    The command we are using is xtmixed (the DV is continuous). I use Stata version 14 on a Mac.

    For the first inquiry, we simply run the following command:
    Code:
    xtmixed DV i.Treatment##i. time  i .BiologyKnowledge i. Male i.Age i.Educ  ideology || ID:, var reml
    For the second inquiry, should we run a three-way interaction among the treatment variable, time variable, and the DNA Knowledge.
    Code:
    xtmixed DV i.Treatment##i.time##i.BiologyKnowledge i. Male i.Age i.Educ  ideology || ID:, var reml
    OR a two-way interaction, with the Treatment variable and the Biology Knowledge variable, controlling for the time variable?
    Code:
    xtmixed DV i.Treatment##i.BiologyKnowledge i.time i. Male i.Age i.Educ  ideology || ID:, var reml
    I am quite certain that we should use three-way interaction, but I can't quite explain why the two-way interaction (without time) is wrong and what exactly the two-way interaction shows.

    As far as I understand, unless we incorporate the time variable to the interaction, two-way interaction between the Biology Knowledge and the Treatment variable will only show whether the effect of Treatment variable on the DV (the difference between the Control and Treatment Group?) is contingent on the Biology Knowledge. But I am not quite sure which difference this interaction term captures between the Control and Treatment Group. The time variable will show the simple effect of time on the DV (average change?), I assume, but then what does the two-way interaction term between treatment and Biology Knowledge show exactly?

    Another option is to run two separate two-way interactions: one with Treatment and time, and one with Treatment and Biology Knowledge.

    Code:
    xtmixed DV i.Treatment##i.time  i.Treatment##i.BiologyKnowledge i. Male i.Age i.Educ  ideology || ID:, var reml

    How would that one differ from a three-way interaction?
    Thanks,
    Sule
    Last edited by Sule Yaylaci; 20 Mar 2018, 17:00.

  • #2
    Your uncertainty over which model to use stems, I think, from having not resolved in your mind exactly what question you want to answer with your data.

    If you want to know whether biology knowledge moderates the effect of the treatment on DV, recognizing that the effect of the treatment on DV is given by the treatment#time interaction, you need the three way interaction i.Treatment##i.time##i.BiologyKnowledge. There is no question about that.

    But that may not be what you want to know. It may be that you are simply concerned with whether the DV outcome depends on Biology knowledge, (and have no expectation that knowledge will modify the treatment effect), then no interaction terms involving knowledge are needed at all.

    If you use i.Treatment##i.BiologyKnowledge i.time, then you are modeling a situation in which there is no expectation that the treatment and control group's DV values will follow different time courses. In other words, it is a model in which receiving the treatment is assumed to have no effect at all on the DV any different from what happens just with the passage of time in the control group. But it also assumes that at both time periods (and to the exact same extent at each time period) the DV in each group differs according to whether or not they have biology knowledge, and the treatment groups themselves also differ on DV (both at baseline and follow-up and to the same extent at both times). I suppose if I spent enough time thinking about it I could contrive a situation that behaved in accordance with this model, but nothing comes readily to mind and it seems far-fetched.

    If you use two interactions, i.Treatment##i.time and i.Treatment##i.BiologyKnowledge your model says that the DV evolves differently over time in the treatment and control groups. It also says that there is an effect of biology knowledge on treatment, and that this effect differs between the treatment and control groups. But the difference between the effects of biology knowledge on the two groups does not change over time. Whatever that between-group difference in effect of biology knowledge on DV is at the start, it remains the same at the end; it does not evolve over the course of the study.

    If the study involves randomized assignment it is hard to see why the last model would apply: you would expect that the effect of biology knowledge on DV to be the same in two groups that were assigned at random. And, as I said, the next to last model is hard to make sense of at all, and certainly would be implausible in a randomized treatment assignment setting for this same reason.

    Be that as it may, the four models answer four different questions. You have to figure out which question you are asking.

    Comment


    • #3
      This is incredibly useful, Clyde. Much appreciated.

      Comment


      • #4

        I have one follow-up question on the matter of two separate two-way vs. three-way interactions.

        If I were to use an OLS model, instead of a mixed model, where the DV is the variable of interest (let's call it 'X') measured in Wave 2 (X_wave2), and Wave 1 measure of the same variable of interest is used as a covariate(X_wave1), would the interpretation of the interaction terms differ at all?
        I am trying to make sure I understand the differences between an OLS model to analyze pretest-postdesign measures and mixed models.

        That is to say, instead of :
        Code:
        xtmixed X_wave2   X_wave1 i.Treatment##i.time i.Treatment##i.BiologyKnowledge i. Male i.Age i.Educ ideology || ID:, var reml
        I use an OLS model :

        Code:
        reg X_wave2   X_wave1 i.Treatment##X_wave1 i.Treatment##i.BiologyKnowledge i. Male i.Age i.Educ ideology
        If I were to interpret these two separate two-way interactions:
        Treatment##X_wave1 : the effect of variable X in wave 1 has different effects on DV( variable X in wave 2) in the treatment and control groups. So, the difference in X varies by Group. This interaction would be the equivalent of Treatment##time in the mixed model, right?

        Treatment##BiologyKnowledge: "There is an effect of biology knowledge on treatment [DV?], and that this effect differs between the treatment and control groups. But the difference between the effects of biology knowledge on the two groups does not change over time."
        Does this mean that the effect of Treatment on DV is dependent on biology knowledge, controlling for the interactive effect of X_wave1 with Treatment on the DV?

        Given that in the OLS model, the DV is measure of X in only the second wave (unlike in the mixed models which takes into account measures of X from both waves in the DV), having the first wave measure as a control variable, the Treatment##BiologyKnowledge interaction only shows whether there is difference between the Treatment and Group on the second wave DV conditional on Biology Knowledge?

        Would this be a feasible strategy to look for conditional impact of DNA knowledge on Treatment Group when we fail to find any significant interaction between Treatment##X_wave1?


        Comment


        • #5
          Let's leave aside the issues raised by the other interactions for a moment and just concentrate on the difference between a longitudinal model and an analysis of covariance. So, stripped to its essence, the longitudinal model is:
          Code:
          mixed outcome i.treatment##i.prepost
          and that of the analysis of covariance model is
          Code:
          regress outcome_post i.treatment outcome_pre
          If there is only one pre- and one post- measurement per unit of observation (person, I guess, in your case) and if there is no missing data, these are actually algebraically equivalent. Both of them are restatements of the statistical model in which the vector [outcome1, outcome2] is a joint normal distribution conditional on treatment. The parameters of that joint normal distribution can be transformed algebraically to give the coefficient of treatment and residual variance in the analysis of covariance model and they can also be transformed to give the (different) coefficient of treatment#time and the variance components at each level.They are, from an algebraic perspective, notational variants of each other. (In particular, the coefficient of treatment#prepost can be regarded as a true treatment effective, whereas the coefficient of treatment in the analysis of covariance model is attenuated by a factor equal to the intraclass correlation coefficient. There are other equations relating all of the parameters of these two models. I do not have the patience to deal with typing them all out.)

          There are some differences between the models statistically. There are differences in the degrees of freedom for inference about parameters. There are differences in the impact of missing values on the results. So from a practical perspective they are somewhat different. But the differences have nothing to do with the substance of the underlying phenomena: they deal with the different strengths and limitations of different estimation procedures. In terms of the substance, they are exactly the same model.

          Now, the second model in your #4 is not that same model, because you have included an interaction term between treatment and outcome_pre. This is a very different model indeed, because it proposes that the effect of treatment (not the level of outcome but the effect of treatment itself) depends on the value of the baseline outcome variable. There is nothing comparable to that in your first model in #4, nor in any model we have been discussing to date. It is a legitimate model, and if there is some scientific basis for considering it, then you could use this model. But do be aware that it is a radically different model from anything you have proposed earlier in this thread. That said, the interpretation of the treatment#biology_knowledge interaction would be the same as explained in #2: the incorporation of the treatment#outcome_pre interaction would not change that.


          Comment


          • #6
            Clyde, I appreciate your patience with me and owe you big thanks for these amazing explanations! Finally, things are starting to become clearer with respect to the differences between anova and mixed models as well as meanings of interactions. I simply thought interaction with the outcome_pre is similar to interaction with time, which was quite wrong clearly.

            I have one more follow-up question, presented in three scenarios :

            1) If I were to write the equivalent of a three-way interaction like the second model in my #1 in an analysis of covariance model, what would it look like? Or is it even possible to write the equivalent of a three-way interaction in OLS model using only outcome_post as the DV?

            I am just curious to know how to incorporate the time effect in the OLS model. If the effect of the treatment variable in the OLS model (second model in your #5) is equivalent to the effect of treatment##time (assuming "prepost" is time) in the mixed model (first model in #5), then could the following model capture something similar?

            The interaction term says, controlling for the effect of baseline outcome variable, the effect of treatment on DV is contingent on the biology knowledge. But this does not capture the effect of treatment on the change in the outcome variable, I understand.
            Or because we are controlling for the baseline outcome variable, and given that treatment variable in the OLS model is substantively the same as the treatment##time in the mixed model, could this interaction between treatment and biology knowledge be the equivalent of the three-way interaction in the mixed model?

            Code:
            regress outcome_post i.treatment##i.biology_knowledge i.outcome_pre
            2) Following the same logic you presented in #2, the three-way interaction in the following model would not be the equivalent, I understand. The three-way interaction in the following model says the effect of treatment on outcome_post depends on the biology knowledge, and this effect is moderated by the outcome_pre variable. I guess it also could mean the effect of treatment depends on the value of baseline outcome variable, and this effect varies by the DNA Knowledge. Or does the order of the terms in the interaction matter?


            Code:
            regress outcome_post i.treatment##i.biology_knowledge##i.outcome_pre
            Either way, this is not what the three-way interaction in the mixed model in my #1 means, I would think.


            3) The only other way that seems plausible to me is to change the DV to a change variable (outcome_post-outcome_pre-- let's call it outcome_change) and then using the interaction of treatment and biology_knowledge in the OLS model. In this second model below, the interaction term would mean the effect of treatment on the change in the outcome variable varies by biology_knowledge, and I find this closest to a three-way interaction among treatment, time and biology knowledge in the mixed model. Would you agree?

            Code:
            regress outcome_change i.treatment##biology_knowledge


            I appreciate your patience, and am thrilled that I have the opportunity ask these questions to an expert in this platform.

            Comment


            • #7
              Code:
              regress outcome_post i.treatment##i.biology_knowledge i.outcome_pre
              would, indeed be the analysis of covariance analog to the three-way interaction model.

              2) Following the same logic you presented in #2, the three-way interaction in the following model would not be the equivalent, I understand. The three-way interaction in the following model says the effect of treatment on outcome_post depends on the biology knowledge, and this effect is moderated by the outcome_pre variable. I guess it also could mean the effect of treatment depends on the value of baseline outcome variable, and this effect varies by the DNA Knowledge. Or does the order of the terms in the interaction matter?


              Code:
              regress outcome_post i.treatment##i.biology_knowledge##i.outcome_pre
              Either way, this is not what the three-way interaction in the mixed model in my #1 means, I would think.
              Your conclusions are all correct. And, no, the order of the terms in the interaction does not matter.

              ) The only other way that seems plausible to me is to change the DV to a change variable (outcome_post-outcome_pre-- let's call it outcome_change) and then using the interaction of treatment and biology_knowledge in the OLS model. In this second model below, the interaction term would mean the effect of treatment on the change in the outcome variable varies by biology_knowledge, and I find this closest to a three-way interaction among treatment, time and biology knowledge in the mixed model. Would you agree?

              Code:
              regress outcome_change i.treatment##biology_knowledge
              Yes, again, your conclusions are all correct. Just to be clear, this is a different model from any of the others mooted to this point. In this model, the difference between the final and baseline outcomes is the dependent variable. So this is a less flexible model than any of the others. It is actually algebraically equivalent to an analysis of covariance model such as
              Code:
              regress outcome_post i.treatment##biology_knowledge outcome_pre
              but with the coefficient of outcome_pre constrained to 1. That is also equivalent to a bivariate normal joint distribution for outcome_pre and outcome_post with the correlation equal to zero (and with a normal distribution this in turn implies that the pre- and post- outcomes are actually independent). So while it is a perfectly legitimate model, you can see that it embeds some assumptions into it that are rather unnatural for most situations.

              Comment


              • #8
                Wonderful! This all makes perfect sense. Thanks for clarifying things for me! Much appreciated!

                Comment

                Working...
                X