Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Interpret longitudinal mixed models

    I recently ran this code for a longitudinal mixed model. The variable (time) is 1, 2, 3. which represent the 3 measurement visits. What I want to know is what is the effect of change in x on Y. The results I get are below. I would like to know how to interpret these results and what I should report in my paper. I also worry I have done some mistake as if I do a standard linear regression, I get the exact same estimates. I hope someone can help.

    xtmixed Y i.x_change education_level_bl bmi_BL smoke_status_BL || time:

    Performing EM optimization:

    Performing gradient-based optimization:

    Iteration 0: log likelihood = -946.13
    Iteration 1: log likelihood = -946.05813
    Iteration 2: log likelihood = -946.05732
    Iteration 3: log likelihood = -946.05732

    Computing standard errors:

    Mixed-effects ML regression Number of obs = 825
    Group variable: time Number of groups = 3

    Obs per group:
    min = 275
    avg = 275.0
    max = 275

    Wald chi2(4) = 55.84
    Log likelihood = -946.05732 Prob > chi2 = 0.0000

    ------------------------------------------------------------------------------------
    Y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
    -------------------+----------------------------------------------------------------
    x_change education | .29579 .0668334 4.43 0.000 .164799 .4267809
    education_level_bl | -.0229858 .0164508 -1.40 0.162 -.0552288 .0092572
    bmi_BL | .0378848 .0080575 4.70 0.000 .0220923 .0536773
    smoke_status_BL | .0648095 .0320243 2.02 0.043 .002043 .127576
    _cons | 1.948898 .2228993 8.74 0.000 1.512023 2.385772
    ------------------------------------------------------------------------------------

    ------------------------------------------------------------------------------
    Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval]
    -----------------------------+------------------------------------------------
    time: Identity |
    sd(_cons) | 6.65e-12 8.43e-11 1.09e-22 .4053042
    -----------------------------+------------------------------------------------
    sd(Residual) | .7617001 .0187524 .7258187 .7993553
    ------------------------------------------------------------------------------
    LR test vs. linear model: chibar2(01) = 0.00 Prob >= chibar2 = 1.0000


  • #2
    This does not look like a sensible model to me. While I can imagine a study design for which this model would make sense, I have never encounter a situation like that in real life and will be astonished if I ever do. Since you refer to this as a "longitudinal" model can I assume that this data arises from the same 275 people being recurrently observed at three time points?

    On that assumption, a plausible model would be to have the person ID, not the time period, as the top-level variable, and with time represented as a bottom-level variable ("fixed effect").

    Code:
    mixed Y i.x_change education_level_bl bmi_BL smoke_status_BL i.time || person_id:
    (Evidently, replace person_id by the actual variable that identifies each person.)

    Try that and see if you don't get more sensible findings. If you need help with the interpretation, do post back showing those results. When you do that, please enclose the results in code delmiters to that they will align in the most readable fashion when displayed.

    Comment


    • #3
      Dear Clyde, thank you for your response.

      You are correct that I have three measurement time points. baseline, 5 years and 10 years follow up.

      I have tried as you suggest but I just get never ending lines of on repeat
      Iteration 0: log likelihood = 5163.8876
      Iteration 1: log likelihood = 17178.126 (not concave)
      Iteration 2: log likelihood = 17180.118 (not concave)
      Iteration 3: log likelihood = 17180.193 (not concave)
      Iteration 4: log likelihood = 17180.198 (not concave)
      Iteration 5: log likelihood = 17180.199 (not concave)
      Iteration 6: log likelihood = 17180.199 (not concave)
      Iteration 7: log likelihood = 17180.199 (not concave)
      Iteration 8: log likelihood = 17180.199 (not concave)
      Iteration 9: log likelihood = 17180.199 (not concave)
      Iteration 10: log likelihood = 17180.199 (not concave)
      Iteration 11: log likelihood = 17180.199 (not concave)

      I am concerned either my dataset is not set correctly somehow, or I have issues with variables because surely this should not be happening.

      Comment


      • #4
        Run it again with the following slight modification:
        Code:
        mixed Y i.x_change education_level_bl bmi_BL smoke_status_BL i.time || person_id:, iterate(7)
        and show the output. This may enable us to see what is interfering with convergence.

        That said, another thing to look at is whether your data are correct. In particular, both this result and what you show in #1 raise my suspicion that your Y variable does not change from one time period to the next within person. That would produce both of these problems, and a multi-level analysis wold be inappropriate for that data (or, depending on what Y actually represents, might mean that your data are erroneous.)

        Comment


        • #5
          I have attached the results table below in a document, I hope you can see it.
          stata_results_table.docx


          Comment


          • #6
            Please note the Forum FAQ, where it is pointed out that attachments are deprecated here. Word documents, in particular, can contain active malware and some of us, myself included, will not download such files coming from people we don't know. I imagine you chose to put the results in a Word document in response to my request in #2 for more readable output. And I appreciate that. But the safe way to do that is to simply copy the results from the Results window or your log file directly into the editor between code delimiters. If you are not familiar with code delimiters, you will find an explanation of how to use them in Forum FAQ #12.

            Comment


            • #7
              My apologies, I did try to attach it as per your previous request, however I have now pasted between the code delimiters as per #12. I hope this is how you want it displayed.

              Code:
               mixed y x education_level_bl bmi_BL smoke_status_BL || Patient_ID:, iterate(7)
              
              Performing EM optimization: 
              
              Performing gradient-based optimization: 
              
              Iteration 0:   log likelihood = -2563.6189  
              Iteration 1:   log likelihood = -2563.6189  
              
              Computing standard errors:
              
              Mixed-effects ML regression                     Number of obs     =      2,782
              Group variable: Patient_ID                      Number of groups  =      1,018
              
                                                              Obs per group:
                                                                            min =          1
                                                                            avg =        2.7
                                                                            max =         60
              
                                                              Wald chi2(4)      =     545.66
              Log likelihood = -2563.6189                     Prob > chi2       =     0.0000
              
              ------------------------------------------------------------------------------------
                               y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------------+----------------------------------------------------------------
                               x |   2.643536   .1144002    23.11   0.000     2.419316    2.867757
              education_level_bl |  -.0122485   .0120417    -1.02   0.309    -.0358499    .0113529
                          bmi_BL |   .0029455   .0036048     0.82   0.414    -.0041197    .0100107
                 smoke_status_BL |   .0261968   .0198187     1.32   0.186    -.0126472    .0650408
                           _cons |   1.905694   .1093281    17.43   0.000     1.691415    2.119973
              ------------------------------------------------------------------------------------
              
              ------------------------------------------------------------------------------
                Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
              -----------------------------+------------------------------------------------
              Patient_ID: Identity         |
                                var(_cons) |   .3056463   .0191935      .2702507    .3456779
              -----------------------------+------------------------------------------------
                             var(Residual) |    .228785   .0076261      .2143158     .244231
              ------------------------------------------------------------------------------
              LR test vs. linear model: chibar2(01) = 1794.62       Prob >= chibar2 = 0.0000

              Comment


              • #8
                So according to this model, a 1 unit difference in x is estimated to be associated with a 2.64 unit difference (in the same direction) in Y. The model and data are compatible with that difference in Y being in the range from 2.42 to 2.87 units (from the 95% CI). My ability to interpret this in a more refined way is limited because you have disclosed no information about x and Y. For example, if the variable x only ranges between, say, 0 and 0.01, then a 1-unit difference in x is not even possible and the associated Y difference would be meaningless and speculative. Similarly, if the values of x range between 0 and 100,000, say, then a 1 unit difference in x might well be within the range of measurement error, so again, the associated difference in Y means little or nothing. On the other side, knowing nothing about Y, it is impossible to say whether a difference of 2.64 units is massive, meaningful but not overwhelming, borderline meaningful, or too small to be worth talking about. So you will have to apply your knowledge about x and Y to put the "finishing touches" on the interpretation.

                I do have another question about this model. You have not included time as a variable here. And that may be perfectly appropriate, again depending on what x and Y are. But be careful here. If x and Y are themselves things that have trends over time, then this model may be allowing x to "take credit" for what is really just the normal time trend in Y. So you need to think about that.

                Comment


                • #9
                  Thank you Clyde. If I was to include a time variable could this be what I have as (time) which is the individual assessments themselves or would it be for example the difference in time between the final measurement and the first? I also feel confused by the number of observations and number of groups. I begin with 1044 individuals which were followed for 10 years, measured 3 times, baseline, 5 years and 10 years. Not all 1044 attended all measurements and in my models I do some restrictions so I have different numbers of individuals whether analysing over 5 or 10 years. I understand this may not be able to be answered if you cant see the data yourself, but I thought I could ask anyway.

                  Comment


                  • #10
                    In this case, the three measurements are equally spaced in real time, so you could either use time = 1, 2, 3, or you can use it as time = 0, 5, 10 and add i.time to the model. The results will be the same either way except for a scale factor of 5 on the time coefficients and a change in the constant. You are concerned whether the inclusion of the time variable changes the x-coefficient, rather than in the time variable coefficients themselves, so suit yourself.

                    The number of "groups" in the regression output corresponds to the number of distinct patient_IDs included in the regression estimations, because your "group" variable is patient_ID. The number of observations will be the total number of records included in the estimation. Now, looking at the results shown in #7, something seems quite wrong. You state that each patient was measured 3 times. But the number of observations per patient ranges up as high as 60! So that is inconsistent and suggests that your data is not what you think it is. Without seeing the data, I cannot say more than that. But it seems like you have a lot of "surplus" observations. If you run -duplicates tag patient_ID time, gen(flag)- and -browse if flag-, you will get to see those excess observations and then you will have to figure out how they got there and what to do to correctly eliminate them.

                    Comment


                    • #11
                      Ok great thank you for the clarity on that. This may be due to the data having missing data and using multiple imputation. Therefore, maybe there should be a mi version of the mixed command...

                      Comment


                      • #12
                        Therefore, maybe there should be a mi version of the mixed command...
                        Stata doesn't have mi versions of any commands. Multiple imputation is done by first creating a multiply imputed data set using the suite of -mi- commands, and then carrying out the intended analysis with the -mi estimate:- prefix. Only some regression commands are supported by -mi estimate-, but -mixed- is one of them.

                        It may be that the multiple imputations are the reason you have so many more observations per patient_ID than just the three that would be expected with time periods. Perhaps you are using a multiply imputed data set. If that is the case, then the analysis you need is:
                        Code:
                        mi estimate: mixed y x education_level_bl bmi_BL smoke_status_BL || Patient_ID:
                        (Note: The -iterate(7)- option I recommended earlier was solely for the purposes of diagnosing what was causing the non-convergence. Now that you have a convergent model, that option is no longer necessary, and in fact might invalidate some of the multiply imputed analyses. There is no guarantee that they will always converge within 7 iterations.)

                        Comment


                        • #13
                          Ah of course yes. Would you also include the i.time variable? I will attach the table of new results below

                          Code:
                          mi estimate: mixed y x education_level_bl bmi_BL smoke_status_BL i.time || Patient_ID:
                          
                          Multiple-imputation estimates                   Imputations       =         20
                          Mixed-effects ML regression                     Number of obs     =      1,984
                          
                          Group variable: Patient_ID                      Number of groups  =      1,018
                                                                          Obs per group:
                                                                                        min =          1
                                                                                        avg =        1.9
                                                                                        max =          3
                                                                          Average RVI       =     0.0232
                                                                          Largest FMI       =     0.1666
                          DF adjustment:   Large sample                   DF:     min       =     704.70
                                                                                  avg       =   1.76e+09
                                                                                  max       =   1.57e+10
                          Model F test:       Equal FMI                   F(   6,95736.4)   =     150.55
                                                                          Prob > F          =     0.0000
                          
                          ------------------------------------------------------------------------------------
                                           y |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
                          -------------------+----------------------------------------------------------------
                                           x |   1.358363    .134515    10.10   0.000     1.094718    1.622008
                          education_level_bl |  -.0236345   .0118677    -1.99   0.046    -.0468948   -.0003742
                                      bmi_BL |   .0102655   .0051097     2.01   0.045     .0002334    .0202976
                             smoke_status_BL |   .0610179   .0237433     2.57   0.010     .0144818     .107554
                                             |
                                        time |
                                          5  |   .4889857    .027121    18.03   0.000     .4358294     .542142
                                         10  |   .6987189   .0388732    17.97   0.000     .6225289    .7749089
                                             |
                                       _cons |   1.724158   .1473666    11.70   0.000     1.434947    2.013368
                          ------------------------------------------------------------------------------------
                          
                          ------------------------------------------------------------------------------
                            Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
                          -----------------------------+------------------------------------------------
                          Patient_ID: Identity         |
                                             sd(_cons) |   .4767238   .0204762       .438234    .5185942
                          -----------------------------+------------------------------------------------
                                          sd(Residual) |   .4972158   .0121058      .4740461     .521518
                          ------------------------------------------------------------------------------

                          Comment


                          • #14
                            It is hard to say how relevant the time variables are. I would say that in most situations, it is wise to include them. That's a generic preference I have in modeling longitudinal data. But sometimes their effects are small enough that they can be ignored. Since you haven't said what y is, I can't tell whether differences in y on the order of magnitude of .4 to .8 (looking at the confidence intervals of those coefficients) are meaningful or not. The confidence intervals are rather well bounded away from 0, so one might lean towards including them on grounds of "statistical significance," but the sample size is pretty ample here and it is entirely possible for a difference that is meaningless from a pragmatic perspective to be "statistically significant." I don't recommend allowing statistical significance to trump real-world meaningfulness in these matters.

                            Comment


                            • #15
                              Y is an index of health status ranging from 0 to 1.
                              As you stated earlier, according to this model, a 1 unit difference in x is estimated to be associated with a 1.36 unit difference (in the same direction) in Y (which is a protein value).
                              Would this be over the 10 year follow up? Do you interpret as a 1 unit difference over 10 years in x is associated with 1.36 unit difference in y at 10 year follow up?

                              Comment

                              Working...
                              X