Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Constrained Longitudinal Data Analysis

    Hello,

    I'am would like to apply a constrained longitudinal data analysis (cLDA) with
    --a binary outcome (0/1)
    --2 groups (randomization groups)
    --2 times points (baseline and post treatment)
    --the constraint that the 2 groups are equal for baseline value (1st time point) for the outcome

    In other words, my question is how to apply (what is the syntax) in stata for the model proposed by Kung-Yee Liang and Scott L. Zeger int heir articleLongitudinal Data Analysis of Continuous and Discrete Responses for Pre-Post Designs, and discussed in other publications :

    Cynthia J Coffman
    To condition or not condition? Analysing ‘change’ in longitudinal randomised controlled trials
    http://https://www.ncbi.nlm.nih.gov/...es/PMC5223669/

    Kaifeng Lu
    On Efficiency of Constrained Longitudinal Data Analysis Versus Longitudinal Analysis of Covariance
    https://onlinelibrary.wiley.com/doi/...0.2009.01332.x

    It don't seems to be that hard but i've gaps with constraints syntax...

  • #2
    I find a way to do it with R : https://datascienceplus.com/taking-t...ined-lda-in-r/
    But on a continuous outcome.

    In order to verify in Stata, i considered my binary (0/1) outcome (reco) to be continuous.
    And if i manage to validate i can get concordant results between Stata and R, i will be able to compute my desirated logistic model with xtmelogit on Stata


    I generated interaction time*group like this :
    Code:
    gen i_rando_visite=rando*visite
    where visite is time and rando is group

    This generated 3 categories :
    1) baseline for randomization group 1 and 2 together (i want them to be equal at this time)
    2) post treatment for andomization group 1
    3) post treatment for andomization group 2

    and computed a mixed model :
    xi:xtmixed reco i.i_rando_visite || id:
    wich lead to estimations nearly (but not) equals to those computed with R.
    So, i think i'am on the good way, but not totaly sure that this is exactly the same...

    Any comments appreciated


    Comment


    • #3
      I have not seen constraints applied to estimation, but I think this syntax would work.

      A note before hand: the -xi- prefix is no longer necessary. We keep seeing people who aren't aware that most estimation commands have supported factor variable syntax without the xi prefix since Stata 11. You also don't have to manually generate interaction terms, and you arguably should not. Also, the -xtmixed- command may still work, but the proper name is now -mixed-. Same for -xtmelogit-.

      Code:
      constraint 1 0.rando#0.visite = 1.rando#0.visite
      mixed reco i.rando##i.visite || id:, constraint(1)
      Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

      When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

      Comment


      • #4
        A code example:

        Code:
        set seed 1001
        clear
        set obs 1000
        gen id = _n
        gen treat = id <= 500
        gen slope = rnormal(0.5*treat,0.25)
        expand 2, generate(time)
        gen y = rnormal() if time == 0
        ttest y if time == 0, by(treat)
        twoway line y time, connect(L) lwidth(vvthin) || lfit y time, lwidth(thick) || , by(treat)
        You should get a non-zero but functionally and statistically negligible difference between groups, but we know that the slopes differ by group, and we can show that with the line graph.

        Now, however, we can see that I made the mistake of not checking my code before posting. It turns out that the constraint is wrongly defined given my proposed syntax, and that we have to fit this model in -gsem-. -mixed- doesn't accept constraints (I wrongly thought it did).

        Code:
        constraint 1 1.treat = 0
        gsem (y <- i.time##i.treat M1[id]), constraint(1)
        estimates store constrained
        gsem (y <- i.time##i.treat M1[id])
        estimates store unconstrained
        estimates table constrained unconstrained
        
        ----------------------------------------
            Variable | constrai~d   unconstr~d  
        -------------+--------------------------
        y            |
                time |
                  1  |  .00018106    .00082082  
                     |
               treat |
                  1  |          0    .00940645  
                     |
          time#treat |
                1 1  |  .49799439    .49671488  
                     |
              M1[id] |          1            1  
                     |
               _cons | -.02487499   -.02957822  
        -------------+--------------------------
          var(M1[id])|  .99205505    .99203594  
             var(e.y)|  .15618924    .15618903  
        ----------------------------------------
        
        . estimates stats constrained unconstrained
        
        Akaike's information criterion and Bayesian information criterion
        
        -----------------------------------------------------------------------------
               Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
        -------------+---------------------------------------------------------------
         constrained |      2,000         .  -2290.006       5    4590.013   4618.017
        unconstrai~d |      2,000         .  -2289.997       6    4591.994   4625.599
        -----------------------------------------------------------------------------
        In both cases, we successfully estimated the mean of the treatment effect (which I specified as 0.5). The constrained model has lower BIC. Constraining 1.treat to equal zero may appear very strange, until you remember that I am fitting the model with an interaction between treatment and time. Hence, 1.treat represents the effect of treatment on the outcome at time 0. I am fairly sure that constraining this to zero is equivalent to saying that both group means at time = 0 are equal, but I would appreciate being corrected if I'm wrong.

        Substantively, I am not sure I buy the rationale for constraining the group means to be equal at time = 0. Lu, in the link above, says

        It may also happen in randomized clinical trials that some randomized subjects have several postbaseline values but fail to provide a baseline value and hence must be excluded from the ANCOVA analysis. For example, a certain amount of missing data may occur at baseline in vaccine trials due to mishandling of the assay or sample processing errors. In general, the baseline value is missing completely at random and hence will not introduce bias into the parameter estimates. However, since the postbaseline values contain information about the treatment differences at postbaseline time points, exclusion of such subjects from analysis may reduce power for detecting treatment differences at postbaseline time points.

        Liang and Zeger (2000) propose a constrained longitudinal data analysis (cLDA) model in which the response vector consists of the baseline value and the values observed at the postbaseline time points and the baseline mean is constrained to be the same across treatment groups. The constraint of equal baseline mean effectively ensures that the baseline value is independent of treatment group. However, by including the baseline value in the response vector, the cLDA model can include all randomized subjects with either a baseline value or a postbaseline value. As a result, the cLDA model may avoid the bias for estimating within‐group mean changes and improve the efficiency for estimating between‐group mean differences at postbaseline time points.
        I'd emphasize that it's extremely important to make sure all subjects are measured at all possible time points. I'm not willing to accept that missing baseline means are missing completely at random. Missing at random conditional on covariates in the model would also be a hard sell, albeit a better assumption than MCAR. (Per my understanding, mixed models are subject to the MAR assumption, and I'm unsure why Lu said MCAR.)
        Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

        When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

        Comment


        • #5
          Hello,
          Thank you for the reply.
          I'am using Stata 12 and commands like mixed and gsem don't work ont this version; I hope a Stata 15 for Christmass



          For dataex, i managed to install it and here is a sample of the data (long form), filtered on the 2 times points 0 and 6 (that are baseline and 1year post treatment respectively). i'll see later to apply the model to all time points.

          Code:
          * Example generated by -dataex-. To install: ssc install dataex
          clear
          input float(id reco visite) byte rando
          1 1 0 2
          1 1 6 2
          2 0 0 2
          2 1 6 2
          3 1 0 1
          3 0 6 1
          4 1 0 2
          4 1 6 2
          5 1 0 1
          5 . 6 1
          end

          I'd emphasize that it's extremely important to make sure all subjects are measured at all possible time points. I'm not willing to accept that missing baseline means are missing completely at random. Missing at random conditional on covariates in the model would also be a hard sell, albeit a better assumption than MCAR. (Per my understanding, mixed models are subject to the MAR assumption, and I'm unsure why Lu said MCAR.)
          I agree, and fortunately, there are no baseline missing value for the outcome, but only on some follow-up measurements. And thats why i want to use this method cause it can handle those missing values by not ignoring all baseline values measured for subjects that have missings in follow-up.


          After a lot of research, i find a suggestion made by Clyde Schechter in this topic https://www.statalist.org/forums/for...n-mixed-models
          saying

          you can use -test- for those constraints with the -coef- option. The -coef- option causes test to display the results of the estimation command with the tested hypotheses applied as constraints.
          and mixed it up with your suggested constraint
          constraint 1 0.rando#0.visite = 1.rando#0.visite
          So i tried :

          Code:
          xtmixed reco i.rando#i.visite || id:
          test  1.rando#0.visite = 2.rando#0.visite, coef
          that gives (for the test output) :

          Code:
           ( 1)  [reco]1b.rando#0b.visite - [reco]2.rando#0b.visite = 0
          
                     chi2(  1) =    1.53
                   Prob > chi2 =    0.2155
          
          
          Constrained coefficients
          
          ------------------------------------------------------------------------------
                       |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          reco         |
          rando#visite |
                  1 6  |   .3706181   .0553817     6.69   0.000      .262072    .4791643
                  2 0  |          0  (omitted)
                  2 6  |   .5106915    .058327     8.76   0.000     .3963727    .6250102
                       |
                 _cons |   .3114035   .0296658    10.50   0.000     .2532597    .3695473
          -------------+----------------------------------------------------------------
          lns1_1_1     |
                 _cons |  -3.101014   3.893901    -0.80   0.426    -10.73292    4.530891
          -------------+----------------------------------------------------------------
          lnsig_e      |
                 _cons |  -.8081613   .0529676   -15.26   0.000    -.9119759   -.7043468

          But i'am unsure, cause the test command was aimed at a coefficient constraint
          and 2 how to interpret what i really need, namely the comparison of my outcome between the randomization group 1 vs 2 at the post treatment time point
          and 3 what does the
          Prob > chi2 = 0.2155
          mean ?

          Here i just have
          --baseline estimations : 3114035 for the 2 groups
          --intragroups postreatment improvements : +0.3706181 for group 1, +0.5106915 for group 2

          and i can't find a way to apply contrast or margins on those estimates constrained in the test command (contrast and margins apply to the native unconstrained model)


          In other hand, like i previously said, i tested the 3 groups
          1) baseline for randomization group 1 and 2 together (i want them to be equal at this time)
          2) post treatment for andomization group 1
          3) post treatment for andomization group 2
          Code:
           gen i_rando_visite=rando*visite if visite==0 | visite==6
          xtmixed reco i.i_rando_visite || id: if visite==0 | visite==6
          that gives nearly equals coefficients :

          Code:
          Mixed-effects ML regression                     Number of obs      =       398
          Group variable: id                              Number of groups   =       228
          
                                                          Obs per group: min =         1
                                                                         avg =       1.7
                                                                         max =         2
          
          
                                                          Wald chi2(2)       =     96.71
          Log likelihood = -245.86507                     Prob > chi2        =    0.0000
          
          --------------------------------------------------------------------------------
                    reco |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
          ---------------+----------------------------------------------------------------
          i_rando_visite |
                      6  |   .3706205   .0554878     6.68   0.000     .2618664    .4793746
                     12  |   .5106891   .0584387     8.74   0.000     .3961513     .625227
                         |
                   _cons |   .3114035   .0297229    10.48   0.000     .2531477    .3696593
          --------------------------------------------------------------------------------
          
          ------------------------------------------------------------------------------
            Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
          -----------------------------+------------------------------------------------
          id: Identity                 |
                             sd(_cons) |   .0451666   .1760276      .0000217    93.80042
          -----------------------------+------------------------------------------------
                          sd(Residual) |   .4465272   .0237075      .4023973    .4954968
          ------------------------------------------------------------------------------
          LR test vs. linear regression: chibar2(01) =     0.02 Prob >= chibar2 = 0.4489
          
          . test 6.i_rando_visite=12.i_rando_visite
          
           ( 1)  [reco]6.i_rando_visite - [reco]12.i_rando_visite = 0
          
                     chi2(  1) =    4.12
                   Prob > chi2 =    0.0424
          But here i can test the 1 year difference between groups that gives a significant p=0.04


          And lastly here are the results obtained with R

          Code:
          Generalized least squares fit by REML
            Model: reco ~ Xalt
            Data: thermactive06
                 AIC      BIC    logLik
            516.3054 540.1787 -252.1527
          
          Correlation Structure: General
           Formula: ~1 | id
           Parameter estimate(s):
           Correlation:
            1   
          2 0.01
          Variance function:
           Structure: Different standard deviations per stratum
           Formula: ~1 | visite
           Parameter estimates:
                  0         6
          1.0000000 0.9297394
          
          Coefficients:
                                 Value  Std.Error   t-value p-value
          (Intercept)        0.3114035 0.03073486 10.131931  0.0000
          Xaltvisite6        0.3705615 0.05452385  6.796319  0.0000
          Xaltvisite6:rando2 0.1401856 0.06634811  2.112881  0.0352
          
           Correlation:
                             (Intr) Xltvs6
          Xaltvisite6        -0.558       
          Xaltvisite6:rando2  0.000 -0.565
          
          Standardized residuals:
                 Min         Q1        Med         Q3        Max
          -1.9054231 -0.6710035 -0.6710035  0.7370806  1.4837683
          
          Residual standard error: 0.4640863
          Degrees of freedom: 398 total; 395 residual
          that gives also nearly equal coefficients
          intercept (baseline) =0.3114035
          group 1 post treatment= +0.3705615
          group 2 post treatment= +0.3705615+0.140185 = 0.5107465

          and the groups difference = 0.140185 with p=0.0352





          Comment


          • #6
            Aurelien,

            I am not sure how to interpret a mixed model in the syntax you gave. I've prefered to specify the full interaction. Here's selected output with your data example. Note the two ##s in the factor variable syntax.

            Code:
            version 12
            xtmixed reco i.rando##i.visite || id:
            
            ------------------------------------------------------------------------------
                    reco |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
                 2.rando |  -.3333333    .248452    -1.34   0.180    -.8202903    .1536236
                6.visite |         -1   .3333332    -3.00   0.003    -1.653321   -.3466788
                         |
            rando#visite |
                    2 6  |   1.333333   .4006167     3.33   0.001      .548139    2.118528
                         |
                   _cons |          1   .1924501     5.20   0.000     .6228048    1.377195
            ------------------------------------------------------------------------------
            
            ------------------------------------------------------------------------------
              Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
            -----------------------------+------------------------------------------------
            id: Identity                 |
                               sd(_cons) |    .000237   .0021194      5.78e-12    9710.597
            -----------------------------+------------------------------------------------
                            sd(Residual) |   .2721654   .0641501      .1714757    .4319797
            ------------------------------------------------------------------------------
            
            test 2.rando = 0, coef
             ( 1)  [reco]2.rando = 0
            
                       chi2(  1) =    1.80
                     Prob > chi2 =    0.1797
            
            
            Constrained coefficients
            
            ------------------------------------------------------------------------------
                         |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
            -------------+----------------------------------------------------------------
            reco         |
                 2.rando |          0  (omitted)
                6.visite |  -.8000002   .2981424    -2.68   0.007    -1.384348   -.2156519
                         |
            rando#visite |
                    2 6  |          1   .3142697     3.18   0.001      .384043    1.615958
                         |
                   _cons |         .8   .1217161     6.57   0.000     .5614408    1.038559
            -------------+----------------------------------------------------------------
            Again, with my syntax, constraining 2.rando to be equal to zero means that you are constraining 2.rando at time = 0 to be zero. (Because it's an interacted model.) You're interested in the between-group difference at 6 months. in the first output using your example data, it's rando2#visite6, which has a beta of 1.333 (bolded line). With that constraint, my understanding of the -test ..., coef- output is that the between-group difference at 6 months is now 1. I think that was what you were interested in.

            With your syntax, you'd have to use the linear combination command if you wanted to get -mixed- to give the between group difference at 6 months. I haven't reviewed your own syntax, because I see you are manually generating interaction terms, and I would rather let the computer do it for me.

            If you want to test the above syntax in my data example, you may have noticed that it won't work. That's because I forgot the code that generated y at time = 1. My apologies! The corrected code for my fictitious example is:

            Code:
            set seed 1001
            clear
            set obs 1000
            gen id = _n
            gen treat = id <= 500
            gen slope = rnormal(0.5*treat,0.25)
            expand 2, generate(time)
            gen y = rnormal() if time == 0
            sort id time
            by id: replace y = y[_n-1] + slope + rnormal(0,0.5) if  _n == 2
            mixed y i.time##i.treat || id:
            test 1.time = 0, coef
            Last edited by Weiwen Ng; 16 Nov 2018, 15:51.
            Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

            When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

            Comment


            • #7
              Hello,
              Thank you for the reply.
              Using the syntax you suggested, stata computes


              Code:
              Constrained coefficients
              
              ------------------------------------------------------------------------------
                           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              reco         |
                   2.rando |          0  (omitted)
                  6.visite |   .3706172    .055382     6.69   0.000     .2620705    .4791638
                           |
              rando#visite |
                     2 6  |   .1400753   .0688797     2.03   0.042     .0050736     .275077
                           |
                     _cons |   .3114035   .0296658    10.50   0.000     .2532597    .3695473
              -------------+----------------------------------------------------------------
              lns1_1_1     |
                     _cons |  -3.101718   3.901959    -0.79   0.427    -10.74942    4.545982
              -------------+----------------------------------------------------------------
              lnsig_e      |
                     _cons |  -.8081542   .0529875   -15.25   0.000    -.9120078   -.7043006
              And ok the between group effect post-treatment (bolded line) is significant p=0.042 with a difference = 0.14 (that i calculated 0.51-0.37 previously)
              And that is a result near to be (but not) equal to R result.

              After examinating more in detail the R code, i tried to adapt it more accurately in stata and i think, that with the commands :
              Code:
              xtmixed reco i.rando##i.visite   || id:visite, covariance(unstructured)
              test 2.rando = 0, coef
              
              
              
              Constrained coefficients
              
              ------------------------------------------------------------------------------
                           |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
              -------------+----------------------------------------------------------------
              reco         |
                   2.rando |          0  (omitted)
                  6.visite |   .3705616   .0542116     6.84   0.000     .2643088    .4768145
                           |
              rando#visite |
                      2 6  |   .1401853   .0659567     2.13   0.034     .0109125    .2694581
                           |
                     _cons |   .3114035   .0305707    10.19   0.000     .2514861    .3713209
              -------------+----------------------------------------------------------------
              lns1_1_1     |
                     _cons |  -2.348813   27.87619    -0.08   0.933    -56.98515    52.28752
              -------------+----------------------------------------------------------------
              lns1_1_2     |
                     _cons |  -.8556513   25.32567    -0.03   0.973    -50.49305    48.78174
              -------------+----------------------------------------------------------------
              atr1_1_1_2   |
                     _cons |  -.9367696   3.163538    -0.30   0.767     -7.13719    5.263651
              -------------+----------------------------------------------------------------
              lnsig_e      |
                     _cons |  -1.714016   140.9697    -0.01   0.990    -278.0096    274.5816
              The results trend to be more equal -stil not- but i think that the small difference is neglectable

              Many thanks for the help
              Last edited by aurelien mulliez; 22 Nov 2018, 03:40.

              Comment

              Working...
              X