Constrained Longitudinal Data Analysis

aurelien mulliez

Join Date: Nov 2018

Posts: 14
#1

Constrained Longitudinal Data Analysis

14 Nov 2018, 09:09

Hello,

I'am would like to apply a constrained longitudinal data analysis (cLDA) with
--a binary outcome (0/1)
--2 groups (randomization groups)
--2 times points (baseline and post treatment)
--the constraint that the 2 groups are equal for baseline value (1st time point) for the outcome

In other words, my question is how to apply (what is the syntax) in stata for the model proposed by Kung-Yee Liang and Scott L. Zeger int heir articleLongitudinal Data Analysis of Continuous and Discrete Responses for Pre-Post Designs, and discussed in other publications :

Cynthia J Coffman
To condition or not condition? Analysing ‘change’ in longitudinal randomised controlled trials
http://https://www.ncbi.nlm.nih.gov/...es/PMC5223669/

Kaifeng Lu
On Efficiency of Constrained Longitudinal Data Analysis Versus Longitudinal Analysis of Covariance
https://onlinelibrary.wiley.com/doi/...0.2009.01332.x

It don't seems to be that hard but i've gaps with constraints syntax...
Tags: None
aurelien mulliez

Join Date: Nov 2018

Posts: 14
#2

15 Nov 2018, 04:04

I find a way to do it with R : https://datascienceplus.com/taking-t...ined-lda-in-r/
But on a continuous outcome.

In order to verify in Stata, i considered my binary (0/1) outcome (reco) to be continuous.
And if i manage to validate i can get concordant results between Stata and R, i will be able to compute my desirated logistic model with xtmelogit on Stata

I generated interaction time*group like this :

Code:

gen i_rando_visite=rando*visite

where visite is time and rando is group

This generated 3 categories :
1) baseline for randomization group 1 and 2 together (i want them to be equal at this time)
2) post treatment for andomization group 1
3) post treatment for andomization group 2

and computed a mixed model :

xi:xtmixed reco i.i_rando_visite || id:

wich lead to estimations nearly (but not) equals to those computed with R.
So, i think i'am on the good way, but not totaly sure that this is exactly the same...

Any comments appreciated
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#3

15 Nov 2018, 07:14

I have not seen constraints applied to estimation, but I think this syntax would work.

A note before hand: the -xi- prefix is no longer necessary. We keep seeing people who aren't aware that most estimation commands have supported factor variable syntax without the xi prefix since Stata 11. You also don't have to manually generate interaction terms, and you arguably should not. Also, the -xtmixed- command may still work, but the proper name is now -mixed-. Same for -xtmelogit-.

Code:

constraint 1 0.rando#0.visite = 1.rando#0.visite mixed reco i.rando##i.visite || id:, constraint(1)

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#4

15 Nov 2018, 11:14

A code example:

Code:

set seed 1001 clear set obs 1000 gen id = _n gen treat = id <= 500 gen slope = rnormal(0.5*treat,0.25) expand 2, generate(time) gen y = rnormal() if time == 0 ttest y if time == 0, by(treat) twoway line y time, connect(L) lwidth(vvthin) || lfit y time, lwidth(thick) || , by(treat)

You should get a non-zero but functionally and statistically negligible difference between groups, but we know that the slopes differ by group, and we can show that with the line graph.

Now, however, we can see that I made the mistake of not checking my code before posting. It turns out that the constraint is wrongly defined given my proposed syntax, and that we have to fit this model in -gsem-. -mixed- doesn't accept constraints (I wrongly thought it did).

Code:

constraint 1 1.treat = 0 gsem (y <- i.time##i.treat M1[id]), constraint(1) estimates store constrained gsem (y <- i.time##i.treat M1[id]) estimates store unconstrained estimates table constrained unconstrained ---------------------------------------- Variable | constrai~d unconstr~d -------------+-------------------------- y | time | 1 | .00018106 .00082082 | treat | 1 | 0 .00940645 | time#treat | 1 1 | .49799439 .49671488 | M1[id] | 1 1 | _cons | -.02487499 -.02957822 -------------+-------------------------- var(M1[id])| .99205505 .99203594 var(e.y)| .15618924 .15618903 ---------------------------------------- . estimates stats constrained unconstrained Akaike's information criterion and Bayesian information criterion ----------------------------------------------------------------------------- Model | Obs ll(null) ll(model) df AIC BIC -------------+--------------------------------------------------------------- constrained | 2,000 . -2290.006 5 4590.013 4618.017 unconstrai~d | 2,000 . -2289.997 6 4591.994 4625.599 -----------------------------------------------------------------------------

In both cases, we successfully estimated the mean of the treatment effect (which I specified as 0.5). The constrained model has lower BIC. Constraining 1.treat to equal zero may appear very strange, until you remember that I am fitting the model with an interaction between treatment and time. Hence, 1.treat represents the effect of treatment on the outcome at time 0. I am fairly sure that constraining this to zero is equivalent to saying that both group means at time = 0 are equal, but I would appreciate being corrected if I'm wrong.

Substantively, I am not sure I buy the rationale for constraining the group means to be equal at time = 0. Lu, in the link above, says

It may also happen in randomized clinical trials that some randomized subjects have several postbaseline values but fail to provide a baseline value and hence must be excluded from the ANCOVA analysis. For example, a certain amount of missing data may occur at baseline in vaccine trials due to mishandling of the assay or sample processing errors. In general, the baseline value is missing completely at random and hence will not introduce bias into the parameter estimates. However, since the postbaseline values contain information about the treatment differences at postbaseline time points, exclusion of such subjects from analysis may reduce power for detecting treatment differences at postbaseline time points.

Liang and Zeger (2000) propose a constrained longitudinal data analysis (cLDA) model in which the response vector consists of the baseline value and the values observed at the postbaseline time points and the baseline mean is constrained to be the same across treatment groups. The constraint of equal baseline mean effectively ensures that the baseline value is independent of treatment group. However, by including the baseline value in the response vector, the cLDA model can include all randomized subjects with either a baseline value or a postbaseline value. As a result, the cLDA model may avoid the bias for estimating within‐group mean changes and improve the efficiency for estimating between‐group mean differences at postbaseline time points.

I'd emphasize that it's extremely important to make sure all subjects are measured at all possible time points. I'm not willing to accept that missing baseline means are missing completely at random. Missing at random conditional on covariates in the model would also be a hard sell, albeit a better assumption than MCAR. (Per my understanding, mixed models are subject to the MAR assumption, and I'm unsure why Lu said MCAR.)

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment

aurelien mulliez

Join Date: Nov 2018
Posts: 14

16 Nov 2018, 03:22

Hello,
Thank you for the reply.
I'am using Stata 12 and commands like mixed and gsem don't work ont this version; I hope a Stata 15 for Christmass

For dataex, i managed to install it and here is a sample of the data (long form), filtered on the 2 times points 0 and 6 (that are baseline and 1year post treatment respectively). i'll see later to apply the model to all time points.

Code:

* Example generated by -dataex-. To install: ssc install dataex
clear
input float(id reco visite) byte rando
1 1 0 2
1 1 6 2
2 0 0 2
2 1 6 2
3 1 0 1
3 0 6 1
4 1 0 2
4 1 6 2
5 1 0 1
5 . 6 1
end

I'd emphasize that it's extremely important to make sure all subjects are measured at all possible time points. I'm not willing to accept that missing baseline means are missing completely at random. Missing at random conditional on covariates in the model would also be a hard sell, albeit a better assumption than MCAR. (Per my understanding, mixed models are subject to the MAR assumption, and I'm unsure why Lu said MCAR.)

I agree, and fortunately, there are no baseline missing value for the outcome, but only on some follow-up measurements. And thats why i want to use this method cause it can handle those missing values by not ignoring all baseline values measured for subjects that have missings in follow-up.

After a lot of research, i find a suggestion made by Clyde Schechter in this topic https://www.statalist.org/forums/for...n-mixed-models
saying

you can use -test- for those constraints with the -coef- option. The -coef- option causes test to display the results of the estimation command with the tested hypotheses applied as constraints.

and mixed it up with your suggested constraint

constraint 1 0.rando#0.visite = 1.rando#0.visite

So i tried :

Code:

xtmixed reco i.rando#i.visite || id:
test  1.rando#0.visite = 2.rando#0.visite, coef

that gives (for the test output) :

Code:

 ( 1)  [reco]1b.rando#0b.visite - [reco]2.rando#0b.visite = 0

           chi2(  1) =    1.53
         Prob > chi2 =    0.2155


Constrained coefficients

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
reco         |
rando#visite |
        1 6  |   .3706181   .0553817     6.69   0.000      .262072    .4791643
        2 0  |          0  (omitted)
        2 6  |   .5106915    .058327     8.76   0.000     .3963727    .6250102
             |
       _cons |   .3114035   .0296658    10.50   0.000     .2532597    .3695473
-------------+----------------------------------------------------------------
lns1_1_1     |
       _cons |  -3.101014   3.893901    -0.80   0.426    -10.73292    4.530891
-------------+----------------------------------------------------------------
lnsig_e      |
       _cons |  -.8081613   .0529676   -15.26   0.000    -.9119759   -.7043468

But i'am unsure, cause the test command was aimed at a coefficient constraint
and 2 how to interpret what i really need, namely the comparison of my outcome between the randomization group 1 vs 2 at the post treatment time point
and 3 what does the

Prob > chi2 = 0.2155

mean ?

Here i just have
--baseline estimations : 3114035 for the 2 groups
--intragroups postreatment improvements : +0.3706181 for group 1, +0.5106915 for group 2

and i can't find a way to apply contrast or margins on those estimates constrained in the test command (contrast and margins apply to the native unconstrained model)

In other hand, like i previously said, i tested the 3 groups

1) baseline for randomization group 1 and 2 together (i want them to be equal at this time)
2) post treatment for andomization group 1
3) post treatment for andomization group 2

Code:

 gen i_rando_visite=rando*visite if visite==0 | visite==6
xtmixed reco i.i_rando_visite || id: if visite==0 | visite==6

that gives nearly equals coefficients :

Code:

Mixed-effects ML regression                     Number of obs      =       398
Group variable: id                              Number of groups   =       228

                                                Obs per group: min =         1
                                                               avg =       1.7
                                                               max =         2


                                                Wald chi2(2)       =     96.71
Log likelihood = -245.86507                     Prob > chi2        =    0.0000

--------------------------------------------------------------------------------
          reco |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
---------------+----------------------------------------------------------------
i_rando_visite |
            6  |   .3706205   .0554878     6.68   0.000     .2618664    .4793746
           12  |   .5106891   .0584387     8.74   0.000     .3961513     .625227
               |
         _cons |   .3114035   .0297229    10.48   0.000     .2531477    .3696593
--------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                   sd(_cons) |   .0451666   .1760276      .0000217    93.80042
-----------------------------+------------------------------------------------
                sd(Residual) |   .4465272   .0237075      .4023973    .4954968
------------------------------------------------------------------------------
LR test vs. linear regression: chibar2(01) =     0.02 Prob >= chibar2 = 0.4489

. test 6.i_rando_visite=12.i_rando_visite

 ( 1)  [reco]6.i_rando_visite - [reco]12.i_rando_visite = 0

           chi2(  1) =    4.12
         Prob > chi2 =    0.0424

But here i can test the 1 year difference between groups that gives a significant p=0.04

And lastly here are the results obtained with R

Code:

Generalized least squares fit by REML
  Model: reco ~ Xalt
  Data: thermactive06
       AIC      BIC    logLik
  516.3054 540.1787 -252.1527

Correlation Structure: General
 Formula: ~1 | id
 Parameter estimate(s):
 Correlation:
  1   
2 0.01
Variance function:
 Structure: Different standard deviations per stratum
 Formula: ~1 | visite
 Parameter estimates:
        0         6
1.0000000 0.9297394

Coefficients:
                       Value  Std.Error   t-value p-value
(Intercept)        0.3114035 0.03073486 10.131931  0.0000
Xaltvisite6        0.3705615 0.05452385  6.796319  0.0000
Xaltvisite6:rando2 0.1401856 0.06634811  2.112881  0.0352

 Correlation:
                   (Intr) Xltvs6
Xaltvisite6        -0.558       
Xaltvisite6:rando2  0.000 -0.565

Standardized residuals:
       Min         Q1        Med         Q3        Max
-1.9054231 -0.6710035 -0.6710035  0.7370806  1.4837683

Residual standard error: 0.4640863
Degrees of freedom: 398 total; 395 residual

that gives also nearly equal coefficients
intercept (baseline) =0.3114035
group 1 post treatment= +0.3705615
group 2 post treatment= +0.3705615+0.140185 = 0.5107465

and the groups difference = 0.140185 with p=0.0352

Comment

Weiwen Ng

Join Date: Jun 2015
Posts: 1241

16 Nov 2018, 15:47

Aurelien,

I am not sure how to interpret a mixed model in the syntax you gave. I've prefered to specify the full interaction. Here's selected output with your data example. Note the two ##s in the factor variable syntax.

Code:

version 12
xtmixed reco i.rando##i.visite || id:

------------------------------------------------------------------------------
        reco |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     2.rando |  -.3333333    .248452    -1.34   0.180    -.8202903    .1536236
    6.visite |         -1   .3333332    -3.00   0.003    -1.653321   -.3466788
             |
rando#visite |
        2 6  |   1.333333   .4006167     3.33   0.001      .548139    2.118528
             |
       _cons |          1   .1924501     5.20   0.000     .6228048    1.377195
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                   sd(_cons) |    .000237   .0021194      5.78e-12    9710.597
-----------------------------+------------------------------------------------
                sd(Residual) |   .2721654   .0641501      .1714757    .4319797
------------------------------------------------------------------------------

test 2.rando = 0, coef
 ( 1)  [reco]2.rando = 0

           chi2(  1) =    1.80
         Prob > chi2 =    0.1797


Constrained coefficients

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
reco         |
     2.rando |          0  (omitted)
    6.visite |  -.8000002   .2981424    -2.68   0.007    -1.384348   -.2156519
             |
rando#visite |
        2 6  |          1   .3142697     3.18   0.001      .384043    1.615958
             |
       _cons |         .8   .1217161     6.57   0.000     .5614408    1.038559
-------------+----------------------------------------------------------------

Again, with my syntax, constraining 2.rando to be equal to zero means that you are constraining 2.rando at time = 0 to be zero. (Because it's an interacted model.) You're interested in the between-group difference at 6 months. in the first output using your example data, it's rando2#visite6, which has a beta of 1.333 (bolded line). With that constraint, my understanding of the -test ..., coef- output is that the between-group difference at 6 months is now 1. I think that was what you were interested in.

With your syntax, you'd have to use the linear combination command if you wanted to get -mixed- to give the between group difference at 6 months. I haven't reviewed your own syntax, because I see you are manually generating interaction terms, and I would rather let the computer do it for me.

If you want to test the above syntax in my data example, you may have noticed that it won't work. That's because I forgot the code that generated y at time = 1. My apologies! The corrected code for my fictitious example is:

Code:

set seed 1001
clear
set obs 1000
gen id = _n
gen treat = id <= 500
gen slope = rnormal(0.5*treat,0.25)
expand 2, generate(time)
gen y = rnormal() if time == 0
sort id time
by id: replace y = y[_n-1] + slope + rnormal(0,0.5) if  _n == 2
mixed y i.time##i.treat || id:
test 1.time = 0, coef

Last edited by Weiwen Ng; 16 Nov 2018, 15:51.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.

Comment

aurelien mulliez

Join Date: Nov 2018
Posts: 14

22 Nov 2018, 03:34

Hello,
Thank you for the reply.
Using the syntax you suggested, stata computes

Code:

Constrained coefficients

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
reco         |
     2.rando |          0  (omitted)
    6.visite |   .3706172    .055382     6.69   0.000     .2620705    .4791638
             |
rando#visite |
       2 6  |   .1400753   .0688797     2.03   0.042     .0050736     .275077
             |
       _cons |   .3114035   .0296658    10.50   0.000     .2532597    .3695473
-------------+----------------------------------------------------------------
lns1_1_1     |
       _cons |  -3.101718   3.901959    -0.79   0.427    -10.74942    4.545982
-------------+----------------------------------------------------------------
lnsig_e      |
       _cons |  -.8081542   .0529875   -15.25   0.000    -.9120078   -.7043006

And ok the between group effect post-treatment (bolded line) is significant p=0.042 with a difference = 0.14 (that i calculated 0.51-0.37 previously)
And that is a result near to be (but not) equal to R result.

After examinating more in detail the R code, i tried to adapt it more accurately in stata and i think, that with the commands :

Code:

xtmixed reco i.rando##i.visite   || id:visite, covariance(unstructured)
test 2.rando = 0, coef



Constrained coefficients

------------------------------------------------------------------------------
             |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
reco         |
     2.rando |          0  (omitted)
    6.visite |   .3705616   .0542116     6.84   0.000     .2643088    .4768145
             |
rando#visite |
        2 6  |   .1401853   .0659567     2.13   0.034     .0109125    .2694581
             |
       _cons |   .3114035   .0305707    10.19   0.000     .2514861    .3713209
-------------+----------------------------------------------------------------
lns1_1_1     |
       _cons |  -2.348813   27.87619    -0.08   0.933    -56.98515    52.28752
-------------+----------------------------------------------------------------
lns1_1_2     |
       _cons |  -.8556513   25.32567    -0.03   0.973    -50.49305    48.78174
-------------+----------------------------------------------------------------
atr1_1_1_2   |
       _cons |  -.9367696   3.163538    -0.30   0.767     -7.13719    5.263651
-------------+----------------------------------------------------------------
lnsig_e      |
       _cons |  -1.714016   140.9697    -0.01   0.990    -278.0096    274.5816

The results trend to be more equal -stil not- but i think that the small difference is neglectable

Many thanks for the help

Last edited by aurelien mulliez; 22 Nov 2018, 03:40.

Announcement