Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Best command for repeated measures analysis

    Dear Statalisters,

    I formerly used -xtmixed- for a repeated measures analysis with Stata 12. I am now using Stata 13, and this command has been replaced by -mixed-, and is described in a separate part of the user manual (ME). However, the XT commands remain for longitudinal/panel data.


    I have a new dataset to analyse: we have a cohort of patients, each with one of two types of medical device (A and B), on which a few measurements (X, Y, Z) were made every few months for several years (X,Y,Z are all continuous variables).
    So, e.g. for Patient 1 with device A: X,Y and Z measured at t=0, 1, 4, 9, 12, 15 months etc. Patient 2 with device B: X,Y and Z measured at t=0, 1.5, 3, 7, 11, 14. (t=time in months).

    We wish to compare the measurements for the two groups (X,Y, and Z in patients with device A vs device B) over time, to see if one type of device is more prone to deterioration/failure.
    I planned to compare the variables individually - i.e. compare X in the patients with device A to X in those with device B, and then do the same for Y and Z. I had thought to use the -mixed- command, but after reading the user manual sections on XT (longitudinal data) and ME (mixed effects), I was confused as to whether -mixed- is also suitable for longitudinal data. And I also saw there are the TS (time series) commands...


    So my questions are:
    1. Is there overlap between XT, ME and TS commands, so that with the correct syntax you could analyse the same problem using alternate commands?
    2. With the -mixed- command, as it is a generalised linear model, if there are repeated measures and time is included as a predictive variable, does that make it a longitudinal data command?
    e.g.
    Code:
    mixed Xvalue month || id: month

    3. Given measurements were not made at the same times for each patient, would it be best to group measurements into time periods (e.g. 4-6 month, 10-12 months) to standardise the times at which measurements are made for analysis, or can models such as -mixed- work with such variability?


    Jem


  • #2
    Sorry, code should have read

    Code:
    mixed Xvalue month || id: device
    where month is time in months, and device is either A or B.

    Jem

    Comment


    • #3
      You do not have time series data here. Time series data would be a single patient with a long series of observations over time. You have longitudinal data (aka panel data in economics/econometrics/finance/sociology).

      -mixed- estimates random effects models with as many levels of hierarchy as you like. -xtreg, re- estimates the same model when there are only 2 levels, although the estimator used to arrive at the results is different. The results of a 2-level model analyzed with -mixed- or with -xtreg, re- will generally be almost identical. That is:
      Code:
      mixed Xvalue month || id:
      // AND
      xtset id
      xtreg Xvalue month, re
      will give essentially identical results.

      When there are more than two levels, the -xt- suite of commands is not applicable, unless you wish to ignore all but two of the levels.

      The inclusion of time as a variable is possible in both the -xt- commands and the -mixed- (and, more generally the -me- commands). Whether it is advisable to do this is a science question answerable based on knowledge of your content. The nature of the data as longitudinal is not affected by what commands are used to analyze it. And the analysis would be considered longitudinal whether time is included as a variable or not.

      If you want to be very strict about it, you cannot contrast devices A and B because the measurements were made at different times, particularly if you want to model time as a discrete variable. If, however, as a practical matter, the change in these outcomes over time is small enough that a 1 month gap can be disregarded, it could be reasonable to treat the 1.5 month observation as if it were made at 1 month, the 4 month observation as if made at 3 months, the 7 and 9 month observations as if both made at, say, 8 months, etc. Whether this is reasonable depends on how much variation over time is expected--again, this is a scientific rather than a statistical question. (By treating them as if, I mean recoding the times accordingly. If time is entered in the model as a discrete variable, then the A-B differences will not be estimable at any times for which the observations were not synchronized.)

      If you expect your outcome variables to grow linearly over time, then you could just enter month as a continuous time variable in your model and -mixed- will not mind that the observation times are different for the two devices.

      Comment


      • #4
        Originally posted by Jem Lane View Post
        Sorry, code should have read

        Code:
        mixed Xvalue month || id: device
        From your description of the study (patients treated with one of two devices), you shouldn't be putting device in your random effects equation. From the code below, mixed does seem to converge even with such a grossly misspecified model, but the variance component is zero (what else could it be?).
        Code:
        version 14.1
        
        clear *
        set more off
        set seed 1345605
        
        quietly set obs 200
        generate byte device = mod(_n, 2)
        generate int pid = _n
        quietly drawnorm intercept slope, double corr(1 -0.5 \ -0.5 1)
        
        foreach month in 0 1 3 6 9 12 15 {
            generate double xyz`month' = 100 ///
            - device / 2 ///
            + `month' / 15 ///
            - device * `month' / 30 ///
            + intercept ///
            + slope * `month' ///
            + rnormal()
        }
        
        quietly reshape long xyz, i(pid) j(month)
        
        mixed xyz i.device##c.month || pid: , nolrtest nolog
        // Wow! Differential deterioration!
        estimates store Sans
        
        mixed xyz i.device##c.month || pid: month, covariance(unstructured) nolrtest nolog
        // Rats! Zilch . . .
        
        mixed xyz i.device##c.month || pid: device, nolrtest nolog
        // Surprised that it converges at all
        lrtest . Sans
        
        exit
        If you're interested in differential rate of device failure, or in differential rate of physical or performance deterioration, then you will need to model it in terms of treatment × time interaction. So, regardless of whether you handle time as continuous (i.device##c.month) or discrete (i.device##i.period), you'll want the interaction term in your fixed effects equation.

        If the study is prospective, then its protocol will give "visit windows" that can provide guidance as to how far you can go to call, say, an observation actually done at six weeks a "1-month" observation.

        If your study is done on a so-called convenience sample of patients, then you would probably be more inclined to handle observation interval as continuous. In that case, if you're concerned about linearity in device deterioration over time, then you can consider polynomial expansions of time (e.g., i.device##c.month##c.month) or splines.

        Comment


        • #5
          Dear Clyde,

          Many thanks for your explanation. Yes, I forgot: Level 1 is the individual measurement, and Level 2 is id (individual patient), right?
          That's good to know re: the alternative approaches for 2 level data. I suppose my question about -mixed-, repeated measures and time as a predictor, really stemmed from the apparent lack of need to specify in the syntax that you are analysing repeated measures, i.e. that there is nothing special about the syntax with time as a variable, and the same syntax (with different variables) could equally be used for a non-repeated measures model. Is that right?

          Re: the time variable - I gave examples of two patients with minimal overlap between measurement timepoints, but for the real data (with perhaps 100 patients) there would probably be slightly better, though incomplete, datasets for X,Y,Z at 12, 24, and 36 months for example, with much less complete datasets at other time points. The measurements for each variable X, Y and Z are unlikely to change much at all month to month, except when there is a problem with the device, in which case there may be either gradual or more abrupt changes, which is really what we're interested in.
          When you say:
          If you expect your outcome variables to grow linearly over time, then you could just enter month as a continuous time variable in your model and -mixed- will not mind that the observation times are different for the two devices.
          I thought that mixed effects models were more tolerant of missing data than repeated measures ANOVA, but they nevertheless required:
          i. measurements made at the same timepoints for repeated measures analyses
          ii. 'balanced' missing data

          Do you mean that these conditions are only if I were to analyse with time (month) as a discrete variable, and that if -mixed- is used with time as a continuous variable, these conditions are no longer required?

          Jem

          Comment


          • #6
            Well, if you model time discretely and you are modeling device#time interactions, some of those interaction terms will have no supporting observations when the observation times are different across devices. No model, mixed or otherwise, can identify an effect for a cell that has no data. That's all I was pointing out.

            By contrast, if you model time continuously, or if you bin the observation times in the ways suggested earlier, then there are no empty cells to contend with.

            Unlike the classical RM-ANOVA estimators, -mixed- is quite tolerant of missing data in almost any pattern. Even if you set up a pattern where there are empty cells, you will still get usable estimates for effects in the non-empty cells.

            Given that you do not expect much time variation in your outcome, and given that most of your observation periods for the two devices are similar, I think you will be in reasonable shape with -mixed- regardless of how you choose to model time. Indeed, you may find that whichever way you model time, it exerts no appreciable effects (that's my inference from your statement that variation over time is not expected, unless the devices malfunction) and you might end up using a model without reference to time.

            Comment


            • #7
              Dear Joseph,

              Many thanks for your reply. Yes, I realise now my code was wrong - it's been a while since I used -xtmixed- (with Stata 12). What I meant to write was

              Code:
              mixed Xvalue month || id: month
              so as to create a random slopes model. But as your models point out, this does not include device as a predictor. And as you say, it would be best to have an interaction term. So something like your

              Code:
              mixed xyz i.device##c.month || pid: month, cov(uns)
              was what I should have written.

              What was the code you wrote prior to the three models? Was that a way to generate hypothetical data to test the models with?

              Jem

              Comment


              • #8
                Thanks Clyde, that's very helpful. Yes, I will also need to analyse rates of 'failure' of each device (in fact, replacement of a component), and will need to give thought as to whether to incorporate time as a factor when doing this (akin to survival analysis). But we are actually interested in the change in values of X,Y and Z over time also, so the discussion above has been very useful.

                I'd be grateful on clarification re: -mixed- models and repeated measures: I've only ever seen mixed effects/multilevel models used for repeated measures analyses, though supposedly you can also use them for measurements at single time points. Is that correct? So, for example, instead of looking at pig weights each month for 12 months in farms A, B and C to examine differences in growth between farms, you could just look at pig weights for each farm at 6 months, without specifying time as a variable. In short, there is no difference in the code between a repeated measures -mixed- model and a non-repeated measures -mixed- model, other than the inclusion of a time variable. Is that right?

                Comment


                • #9
                  You are correct that you can use -mixed- for non-repeated measures analyses as well. All that matters is that there be nesting of the data: whether it be repeated observations within patients or pigs within farms.

                  I wouldn't endorse your distinguishing repeated measures from non-repeated measures by virtue of including time as an explicit variable. After all, repeated measures could mean, for example, measuring something on the left eye and on the right eye--no time involved. Similarly, one could look at the weights of pigs on farms and analyze whether over the years there has been a secular trend in the weights of the pigs. That analysis would require introducing time into the model, but it would be different pigs in different years, so no repeated measures.

                  Comment


                  • #10
                    Thanks Clyde - that has helped me understand something I have wondered about for a while.

                    Comment


                    • #11
                      Originally posted by Jem Lane View Post
                      What was the code you wrote prior to the three models? Was that a way to generate hypothetical data to test the models with?
                      Yes, I should have made that clear, sorry: it was just to create an artificial dataset for use in illustration.

                      Comment

                      Working...
                      X