Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Including baseline outcome value as a covariate for the linear and quadratic growth terms in a multilevel mixed-effects growth model

    Hello Everyone,

    I am hoping for some guidance on whether it's ok to include the baseline (i.e., "initial") outcome value of the outcome as a covariate of the linear and quadratic growth terms in a mixed-effects model. The outcome here is BMI measured at 6 time points (T1-T6), although time is a continuous variable that can be different for different people. We think BMI will undergo quadratic growth, and we are primarily interested in whether the growth was different between intervention and control. However, in secondary moderator analyses we want to see if the linear and/or quadratic growth rates were different depending on initial BMI at baseline (e.g., participants with higher initial BMI may have a smaller acceleration and/or linear growth while those with lower initial BMI have have larger acceleration and/or linear growth). We also think that the linear and/or quadratic intervention effects could be different depending on initial BMI.

    Does anyone know if adding baseline BMI as covariate for the linear and quadratic growth terms is acceptable? Is it a problem that this "moderator" is actually the outcome at T1? If it's a problem, what are some possible ways to handle this? Any references supporting or refuting this approach would be highly appreciated!

    See below for more context, and how I'm considering implementing this in Stata (feedback welcome here too!):

    Below is the original model (without baseline BMI as a covariate). Note, the original model allows boys and girls to have different initial BMI (but not different growth curves), and it also allows for people at different ages at baseline to have different growth curves (e.g., someone who was 3 at baseline is expected to be at a different part of the growth curve than someone who was 6 at baseline).

    Code:
    xtmixed bmi    ///
    i.cfemale i.trt c.agemoscen    ///
    c.time i.trt#c.time c.agemoscen#c.time    ///
    c.time2 i.trt#c.time2 c.agemoscen#c.time2    ///
    || indexchildid: c.time c.time2, cov(un) mle

    Below is how we think the model with baseline BMI included might look (with 4 additional terms: 2 for linear and 2 quadratic). I do not include the baseline BMI as a covariate for the intercept, because they are essentially synonymous.

    Code:
    xtmixed bmi    ///
    i.cfemale i.trt c.agemoscen    ///
    c.time i.trt#c.time c.agemoscen#c.time c.basebmi#c.time i.trt#c.basebmi#c.time    ///
    c.time2 i.trt#c.time2 c.agemoscen#c.time2 c.basebmi#c.time2 i.trt#c.basebmi#c.time2    ///
    || indexchildid: c.time c.time2, cov(un) mle
    Thank you very much!

  • #2
    This approach is perfectly reasonable. There is just one technical problem: you need to exclude the T1 observations from the estimation sample. Then you will have a well specified model that reflects the sources of variation you described.

    One other minor point: if you are using current (version 14) Stata, the command has been renamed -mixed-. The parser will still accept -xtmixed-, but it has gone undocumented, and presumably in the near future will only run under version control. So switch now to -mixed- if you are on version 14. A few of the default options work differently when you specify -mixed-, so check the manual. For example, mle is now the default estimation method and does not need to be specified when you call it -mixed-.

    Comment


    • #3
      Thank you for the helpful reply, Clyde! I saw that change from xtmixed to mixed, and I do use version 14. I'll look into updating my code, just in case.

      Yes, I was wondering if we'd have to take the T1 observations out of the outcome (to prevent them from being both the T1 outcome and a covariate in the model).

      My follow-up question is: Does doing this have any other implications for the model or its interpretation? The new "intercept" is now T2 BMI, which is obviously no longer pre-randomization, but I'm wondering if there's anything else that may be different or may need to be adjusted to accommodate this approach. For example, would it be important to add two additional terms for T1 BMI as a covariate for the new intercept (now T2 instead of T1) making the model:

      Code:
       
       xtmixed bmi    /// i.cfemale i.trt c.agemoscen c.basebmi i.trt#c.basebmi    /// c.time i.trt#c.time c.agemoscen#c.time c.basebmi#c.time i.trt#c.basebmi#c.time    /// c.time2 i.trt#c.time2 c.agemoscen#c.time2 c.basebmi#c.time2 i.trt#c.basebmi#c.time2    /// || indexchildid: c.time c.time2, cov(un) mle
      The "time" variable is actually coded as time since baseline, which I assume is still acceptable to use with this approach. Thanks again! You are very much appreciated.
      Last edited by Evan Sommer; 07 Feb 2017, 20:33.

      Comment


      • #4
        You don't have to change anything else in the coding. What you do have to be aware of is that the interpretation of the results is now different. Before you included the baseline values as covariates, you had a plain vanilla growth model (OK, not plain vanilla because you have quadratics, but never mind that). The coefficients of time and time squared would simply have been the description of a growth trajectory over time.

        Now, two things have happened. First, the T1 observations are gone. So the piece of the trajectory between T1 and T2 is no longer accounted for directly in the coefficients. Second, you have added new covariates to the model, so that you are now adjusting for the baseline values. The inclusion of the baseline values as covariates has the effect of attenuating the coefficients by turning the coefficient into a mixture of the growth rate and the intra-class correlation of the BMI outcome. Since you have 6 time periods, neither of these changes will be particularly large, and you will most likely get results pretty similar to what you had in the first model. I would ordinarily recommend that you stick with the first model, but since you believe that there are actually interactions between your treatment and the baseline BMI, you really have to go to the second model, or something like it.

        Pondering your situation a bit more, I have a couple of ideas you might consider. Your model is already pretty complicated. You have introduced an age variable because you expect subjects of different ages to start out at different places on the trajectory. Quite so. But then why do you need the time variable at all. Why not use age instead of time? Do you have any reason to expect that time elapsed from baseline has a different effect on BMI from the effects of aging itself? I can also tell you that within subject, time and age will be perfectly colinear, which could make model estimation difficult. You may fail to achieve convergence hitting flat likelihood regions.

        Next, are you sure you need the quadratic terms? I realize that BMI vs time is not a linear function over extended periods. But it is not really quadratic either: it does not peak and then decrease (at least not unless severe illness supervenes). It just grows at a decelerating rate. Since there is no real U-shape in the data, just a curvilinearity that never peaks. The quadratic is a mis-specification. You will find that your fitted quadratic curve will be one that has its vertex outside the range of observed times in your data. So all that quadratic term is doing is providing some deceleration, and doing it at least somewhat incorrectly at that. You might better capture the curvilinearity by modeling BMI as a function of the square root of age, or the cube root, or the logarithm, or something like that. Those functions will show a deceleration in growth but will not pretend to peak and then decline. (I know, lots of people model BMI as a quadratic function of age and you see it in the literature a lot, but that doesn't make it right. It's clearly wrong in principle; the only issue is how much damage it does.) Going even a step beyond, if the time elapsed between T1 and T6 is sufficiently short, then the deceleration in BMI growth will barely show up. In that situation, a linear model might actually be just as good as any alternative, and it would be a lot simpler to work with. Take a look at some BMI growth charts and see how the curves look when restricted to the range of ages in your actual data. You may find that over that narrow range a straight line is an excellent approximation.


        Comment


        • #5
          Correction to #4. I realize that I misread your variable agemoscen to be a running age variable. But you imply in #1 that it is actually age at baseline. So my comment about age and time being completely colinear within subject is wrong. However, it is not going to convey much information that is not being picked up by the baseline bmi variable, so you are likely to have wide standard errors around both of those. Moreover, as a baseline age, it is now constant within subject, so it is unnecessary to have it when the random intercept at the subject level can capture it. So unless you specifically want an estimate of the effect of age at baseline, I think you are better off leaving it out: it's just going to blur the estimates of baseline bmi effect and random intercepts, possibly to the extent of making convergence difficult.

          Comment


          • #6
            Very interesting, thanks again! It's refreshing to see you going through a very similar thought process to ours, and I appreciate the time you are devoting to these questions!

            You are correct about the agemoscen variable being age at baseline, which is constant over time and not a running age variable (it's ambiguously named, so nice catch!). This avoids the collinearity of time with age issue.

            I didn't expect to get so deep into the context, so let me provide some more! The first model above (that uses all 6 time points and does not include a covariate for baseline BMI) is our pre-specified primary model, so we'll be sticking with it for our primary outcome analysis (see below for the point about quadratic growth). The second model with baseline BMI as a moderator interacting with intervention effect is a secondary, exploratory model that we're still thinking about, and we have much more leeway with how we might add or subtract covariates. So, thankfully, this is not an "either or" decision.

            With respect to the baseline BMI model, we can indeed do things like take out T1 BMI from the outcome, and remove the age at baseline covariates if there is a collinearity issue with baseline BMI, for example. It sounds like you recommend we do both of these things, and I'll be sure to take that into account as we settle this model, so thanks again!

            Next, I'd like to address the quadratic vs. linear vs. something else issue. I completely agree with your thoughts about modeling BMI in general and being cautious about applying a functional form that may not be appropriate. When considering this, the age of the sample is the key. Our participants were 3-5 at baseline, and they will be 6-8 at the end of the study. Also note, they were between the 50th to 85th percentiles on BMI at baseline. See the highlighted area on the CDC's BMI growth chart below (for girls as an example). This particular part of the BMI growth trajectory for children seems to lend itself well to a quadratic model for most of our participants, so we'll be sticking with that approach for both of these models (although we may attempt to compare other models down the road).

            Click image for larger version

Name:	Sample BMI Chart.JPG
Views:	2
Size:	40.7 KB
ID:	1373479

            Note that the main reason the primary model has baseline age covariates for the intercept, linear, and quadratic growth parameters is because the children who were older at baseline (5-5.9) probably will have the kind of growth you describe (practically linear, with a near-zero quadratic term), while the younger children will probably have quadratic growth (a larger quadratic term). The approach in the primary model allows for this. It is my understanding that the approach you recommend for the moderator analysis also allows for this, except it uses baseline BMI instead of baseline age. According to this chart, we might actually expect higher baseline BMI to predict a larger quadratic term because higher baseline BMI may indicate a younger child. Similarly, we might expect lower baseline BMI to predict a smaller (near zero) quadratic term because it may indicate an older child.

            However, given the BMI variability within the shaded region above, I'm wondering if the severity of age-BMI collinearity is enough to warrant not explicitly modeling both in order to truly isolate the potential effects of baseline BMI, and separate them from baseline age. That said, I do share your concerns about convergence with such a complex, interrelated model.

            To summarize, your recommendations for the second, baseline BMI moderator analysis are to 1) take out T1 BMI from the outcome, 2) remove the age at baseline (agemoscen) covariates from the model to avoid potential collinearity with baseline BMI, and 3) include the 2 baseline BMI terms for each of the 3 growth parameters (intercept [which is now T2], linear growth, and quadratic growth). Like this:

            Code:
            xtmixed bmi    ///
            i.cfemale i.trt c.basebmi i.trt#c.basebmi    ///
            c.time i.trt#c.time c.basebmi#c.time i.trt#c.basebmi#c.time    ///
            c.time2 i.trt#c.time2 c.basebmi#c.time2 i.trt#c.basebmi#c.time2    ///
            || indexchildid: c.time c.time2, cov(un) mle

            Please let me know if I have any of this wrong, and thanks again!
            Attached Files

            Comment


            • #7
              You are thinking very clearly about this! I must admit that in my remarks about quadratic model and colinearity of baseline age and baseline BMI I did not consider that you might be working in the age range where there is, indeed, a U-turn. I was thinking more about BMI trajectories in teens and adults--which are what I have worked with in my own research. So I retract recommendation 2), which was based on those assumptions. That said, I would start with the model you show at the end of post #6, and then also add baseline age (and its interactions with linear and quadratic time) and see a) if that interferes with convergence [I now suspect it won't] , and b) if you still converge, does it change the model substantially [an LR test and AIC or BIC would help, but also look at predicted values across the range of ages and see if they change enough to matter.]

              Comment


              • #8
                Excellent! Thank you for the compliment and great advice! I think I have a much better understanding of how we might approach baseline BMI as a moderator within the context of mixed-effects growth modeling.

                Finally, if you have any references that might support this method of moderation analysis, I would greatly appreciate that as well! The closest thing we could find was a reference to doing something like this in an SEM context: Muthen & Curran (1997). General Longitudinal Modeling of Individual Differences - A Latent Variable Framework for Analysis and Power Estimation. This doesn't quite apply to the specific modeling strategies here though!

                Thanks again!

                Comment


                • #9
                  Going to back to #2
                  There is just one technical problem: you need to exclude the T1 observations from the estimation sample
                  . Can anyone kindly point me to reference on this?

                  What if the baseline variable is log transformed? Would it then be okay to include T1 observations?

                  Your comments are much appreciated.

                  Al Bothwell



                  Comment

                  Working...
                  X