Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Help with xtmixed to compare groups and repeated measures data

    Hi,
    I have the following dataset:
    two groups of samples, TOP (9) and BOTTOM (10), which I have classified with the dummy variable GRP, 0 and 1 respectively. I have performed experiments on these samples to measure the outcome/dependent variable OUT, measuring this at baseline, and then at 6 further timepoints (CONC, for concentration) where a drug concentration was increased incrementally at each timepoint.

    I am interested to know the following:
    i. is there a difference overall between groups TOP and BOTTOM? (as these are the only possible groups for an individual sample to belong to, I assume GRP is an explanatory variable rather than a level... So level 1 is the repeated measures, and level 2 is the individual sample.
    ii. is there a difference in either group in the outcome variable with increasing concentration of the drug? i.e. is there a significant increase/decrease in OUT with increasing CONC (drug concentration)?

    I have done quite a bit of reading, and as far as I can see, I would begin with something like

    xtmixed OUT CONC

    What I am not sure about is:
    i. the precise order of the next steps - do I need to build the command layer by layer, e.g. first add random intercept, then random slope, then add GRP as a fixed explanatory variable?
    ii. how to test the significance of the differences I am interested in (above)
    iii. how to generate mean predicted values for each group (TOP and BOTTOM) at each drug concentration (CONC) so I can plot the two predicted group lines

    Any help would be gratefully received.

    Jem

  • #2
    Your question
    i. is there a difference overall between groups TOP and BOTTOM?
    is unclear. Are you asking if the groups differ in the baseline values of the outcome? Or do you want to know whether the groups show different dose-response curves? Of something else about the groups? Your second question is also unclear. Are you asking to look at each dose level to determine whether the groups exhibit different outcomes? Or, again, should this be interpreted as a question about the overall dose-response trajectories of the groups?

    Another key element not specified here is what type of dose-response curve you expect. Is it linear? Or biphasic? Or perhaps it rises to a peak and then declines? Or what? The right way to include CONC in your model depends crucially on this.

    With regard to the items you said you are not sure about:

    i. Building a multi-level model one step at a time is usually a good idea. I would probably work in all the explanatory variables in a single-level model first, making sure I have good specifications for the dose variable that correspond to the anticipated shape of the dose-response curve. Then I would add in the random intercept, and then the random slope.

    ii. Since it's not clear what you're interested in, it's hard to guide you here. I will say this: if you are going to be contrasting the groups at some or all time points, your model probably needs to include a GRP#CONC interaction term.

    iii. See the -margins- and -marginsplot- commands. It should be easy as pie!

    Comment


    • #3
      Hi Clyde,

      thanks a lot for your response. Re: your first point, yes, I see what you mean. I certainly wish to know if there is a significant difference at baseline measurement. If I ran the following:

      xtmixed OUT CONC GRP

      am I right in thinking the P>|z| in the fixed effects table indicates whether there is a baseline difference between groups (based on this simple regression)?
      I am also interested in whether there is a significant increase/decrease for each group between baseline and max drug concentration. And finally, if possible, whether there is an overall difference between groups in their dose-response curves; so these latter two questions are similar to what I think you can get if you run a two way ANOVA repeated measures.

      Regarding the type of dose-response curve, it is difficult to say. If I plot the data with CONC on the x axis as regular intervals (i.e. 0, 1, 2, 3 etc) representing the timepoints at which incremental concentrations are administered, the plots are reasonably linear. That was going to be another question - whether I should have concentration (e.g. 10^-6 mmol/L) as the x axis. But I'm happy to keep it simple for now - so assume linear and regular timepoints rather than concentration.

      Re: interaction term - yes, I should have thought of that.

      I couldn't see how to delete the duplicate post, but if you know how, could you let me know?

      Jem

      Comment


      • #4
        With CONC = 0 as the coding for the baseline value, yes, the GRP row of your regression output shows the effect of GRP at baseline.

        If you want to assume linearity and regular timepoints rather then --mixed OUT i.GRP##c.CONC-- is your starting point, and then you can add in the random effect and slope.

        Pretty much everything you want after that you can get from the -margins-, -marginsplot-, and -contrast- commands.

        As for the broader question of whether to replace CONC by the actual drug concentrations, if the drug concentration is actually linearly related to the timepoint, then it makes no substantive difference which you use. It's just a different scale and units. But if the actual drug concentrations are noisy or bear a non-linear relationship to the time point, then those are different models. Which is more relevant to practice really depends on whether it is possible to actually obtain a pre-specified concentration at will, or whether all one can control is the number of doses administered, the concentration then being a random response, and whether you are studying pharmacology/physiology or clinical epidemiology.

        I don't think it's possible to delete duplicate posts. My posting of that remark was just to alert other forum readers and, if you were deliberately reposting, to point out that it would be better not to do that.



        Comment


        • #5
          Thanks Clyde. When you have written --mixed OUT i.GRP##c.CONC-- could I ask if the 'i' before GRP is necessary (I have not been putting it in), and what the 'c' is for before CONC?

          Should my model development go something like this then?

          1. xtmixed OUT CONC GRP /// to look for baseline difference between groups - if P>|z| <=0.05 this indicates a difference at baseline

          2i. xtmixed OUT CONC GRP || id:, mle variance nostderr /// random intercept model; nostderr as allows estimates in cases where model otherwise fails to converge

          or

          ii. xtmixed OUT CONC GRP || GRP:, mle variance nostderr /// GRP is an explanatory variable and not a level, but seems more natural to fit random intercept for GRP. What do you suggest?

          3. Compare 2i or 2ii with 1 using lrtest, and if significant, accept 2 and move on

          4i. xtmixed OUT CONC GRP || id: CONC, mle variance nostderr

          or

          ii. xtmixed OUT CONC GRP || GRP: CONC, mle variance nostderr /// random slope model to test for difference in dose-response curves between groups? Again, not sure if 'id' or GRP should go

          in random part.

          5. compare 4i or 4ii with 2 using lrtest, and if significant accept and move on


          Just re-read your post - so perhaps this would go in between 3 and 4?

          xtmixed OUT CONC GRP GRP#CONC || id:, mle variance nostderr ///


          Jem

          Comment


          • #6
            First of all, you need to have the interaction terms in there. Some of your models above have that, but some omit them.

            The easiest specification (Stata 13) is

            Code:
            mixed OUT i.GRP##c.CONC
            The i. is optional--interaction terms assume i. by default. The c. is absolutely necessary if you want to specify a linear relationship between CONC and OUT. Without the c., CONC will be treated as a discrete variable and will be represented by 6 dummy variables. Without c., you will not get any estimate of a slope for concentration overall: just differences between outcome at baseline and outcome at each particular time point.

            Note also the double ##. That will tell Stata to include both main effects and the interaction.

            If you are using Stata 13, the command is now -mixed-, not -xtmixed-. And, similarly, mle and variance are now the defaults technique and need not be explicitly specified. As for -nostderr-, that's up to you and what kind of output you want. If you are using Stata 12, then all of those specifications are needed to get an output with variance components specified in the variance metric and ML estimation.

            The grouping variable in your data is id, and it is -|| id:- that you need for a random effect at the person level, and -|| id: CONC- for a random effect at the person level and a random slope at the person level.
            Last edited by Clyde Schechter; 20 Nov 2014, 18:31.

            Comment


            • #7
              Thanks again Clyde - really helpful. Sorry, I should have specified - I'm using Stata 12. I think there are some differences in the syntax cf Stata 13, other than the obvious 'xt' mixed.

              I see - so id is the grouping variable. That makes things easier.

              I'm going to have a play around a bit later to check I understand things fully.

              Jem

              Comment


              • #8
                Dear Clyde,

                I have had a chance today to review your previous comments and do some further reading. To recap, the questions I want to answer are:

                1. is there a significant difference between groups at baseline?
                2. is there a significant change for group TOP between baseline and the other timepoints (in particular the last)? (And the same for group BOTTOM).
                3. is there a difference between groups in the shape of their responses over the repeated measures/increasing drug concentration, e.g. does one response curve remain flat, and the other decrease from baseline?

                Firstly, when you refer to 'main effects', do you mean the standalone/non-interaction effect of the variable in the fixed part of the model? For example, in the command

                xtmixed OUT CONC GRP i.GRP#c.CONC /// CONC and GRP are main effects

                Secondly, noting your comments about use of i. and c. in the command, I'm not quite sure what to do about CONC as an independent variable. The samples underwent repeated measures, with similar intervals between measurements. However, at each measurement timepoint, the drug concentration of the solution was incremented by me in the following manner: 0, 10^-9M, 10^-8M, 3x10^-8M, 10^-7M, 10^-6M. So non-linearly associated with the timepoints.
                Given that they are repeated measures at roughly equal intervals, it seems to me the best way to model this is with CONC as an ordinal/continuous variable taking the timepoint (0,1,2,3 etc), rather than to try to incorporate the actual concentration (0, 10^-9, 10^-8 etc). I am not actually interested in trying to make predictions based on my data; only to test if differences between groups exist. As such, would it perhaps be better to omit the (c.) from the command prior to CONC? What do you think?

                Finally, I am still not clear if I need to build the model one step at a time, testing each expanded step with the preceding step using lrtest (e.g. before and after including GRP#CONC interaction, or before/after adding || id: random effect). Could I simply go straight to

                xtmixed OUT CONC GRP i.GRP#(c.)CONC || id: CONC, mle variance

                Sorry not to have a graph to illustrate this yet - I note what you said previously about margins/marginsplot and will try to do this to make it clearer to understand my data.

                As before, your assistance is much appreciated.

                Jem

                Comment


                • #9
                  So, you have correctly understand my meaning for main effects. But let me suggest that you not use the notation you have chosen there. Better is:

                  Code:
                  xtmixed OUT i.GRP##c.CONC
                  The ## notation works in Stata 12 and it guarantees you that if you switch between c.CONC and i.CONC (as you propose later) Stata will automatically change both the main effect and the interaction effect of CONC and keep them in synch. Otherwise you would risk getting something like:

                  Code:
                  // BAD MODEL
                  xtmixed OUT i.GRP CONC i.GRP#i.CONC
                  where you have CONC as a continuous/ordinal main variable but CONC is specified as a discrete variable in the interaction term--a model that probably would make no sense.

                  If your model is really as simple as we have been discussing, going directly to the random slope model is probably OK without stepping your way up. (But don't put parentheses around the c.). Worst comes to worst it will fail to converge and then you will have to step it back to see how close to that model you can come. But with more complicated models that include several covariates or more levels of hierarchy, it is safer to start simple and build up the complexity one step at a time.

                  Note also that if you go to CONC as a discrete variable, then the notation -|| id: CONC- will not work as you intend. In that case, you need to specify the random slope as -|| _all: R.CONC- to assure that CONC is treated as a discrete variable everywhere in the model.

                  If you go to CONC as a discrete variable, then you will be testing, in question 3, for the presence of arbitrary patterns such as Group 1 has higher OUT than Group 0 at CONC = 1, 2, 5, 9, and 10 but lower than Group 0 at CONC = 3, 4, 6, 7, and 8. (Or any other pattern of differences.) If that is what you want, then treating CONC as a discrete variable is appropriate. Generally in pharmacologic/physiologic studies though, one is looking for some more orderly dose-response pattern such as flat or linear, or U-shaped or inverted U-shaped. In that case it is better to represent CONC as a continuous variable (for flat or linear) or to capture a U-shape by a quadratic term or using splines.

                  Comment


                  • #10
                    Addendum:

                    On the issue of whether to use concentration or time, I note that the concentration-time relationship is nearly exponential. It follows that if your outcome response is linear in concentration it will be highly non-linear in time, and vice versa. (Of course, it may well fail to be linear in either.) I think that you would be well advised to look at some scatterplots of outcome vs time and concentration (perhaps separately by GRP) to see if you can figure out which representation gives a more manageable picture of what is going on, and then use that in your modeling. These graphs may similarly shed light on your discrete vs ordinal continuous dilemma.

                    Comment


                    • #11
                      Thanks Clyde. I think I have grasped the i. and c. notation now - very useful. I am going to produce some plots as you suggest before going any further. Need to read up/refresh re: margins/marginsplot/contrast though first.

                      Jem

                      Comment


                      • #12
                        Hi Clyde,

                        Here are a couple of plots of my data - the first is a plot of the mean values for each group at each concentration - plotted using twoway connected. I read about margins in the handbook, but they did not seem to make sense to me. At least, I could not see why you would choose to do this rather than plot 95% confidence intervals. And with marginsplot, I got three lines, even though there are only two groups...
                        The second figure is a scatter and best fit plot for each individual in my sample, obtained from the raw data (scatter) and after running the command

                        xtmixed OUT i.GRP##c.CONC || id: CONC, mle

                        followed by the predict command, then

                        twoway (scatter...) (line...)... /// to give an visual impression of how well the model is fitting to the data: quite well!

                        Finally, the output from the above command (sorry this comes out small - hope you can read it). Could I check my understanding of the 4 things circled?

                        1 (red): the intercept of GRP0 at CONC 0
                        2 (green): the p-value for null hypothesis that there is no difference between GRP0 (the reference) and GRP1. This indicates the null hypothesis should be rejected: there is a difference. But is this just at baseline, or across all concentrations?
                        3 (blue): I am not sure here. Is it the p-value for the null hypothesis that there is no difference in OUT across all concentrations, but for both groups combined? And as it is <0.05, this suggests OUT does vary with CONC in some way?
                        4 (purple): p-value for the interaction - highly significant, suggesting there is an interaction and that OUT varies with CONC differently for each group

                        Jem

                        Comment

                        Working...
                        X