Correlation between repeated measures over time

Scott Brimble

Join Date: Sep 2015

Posts: 2
#1

Correlation between repeated measures over time

30 Sep 2015, 06:54

Hello – I am new to Stata (14.0). I have an observational dataset of about 1,000 subjects with repeated measures over time of two lab variables and I am interested in whether or not they are correlated over time to help inform some simulation scenarios. The time intervals are not the same for each patient – i.e. subj1 may have values at t=0, 30, 126, 300 etc., subj2 t=0, 55, 200 etc. The number of measures per subject also varies. I am uncertain what would be the best approach to this. Thanks for considering.
Tags: panel data, Time Series
Marcos Almeida

Join Date: Apr 2014

Posts: 4047
#2

30 Sep 2015, 07:45

Welcome to the Forum.

To me, it seems like an unbalanced panel data. However, for time series, that would depend on the variables and field of research as well.

In the Stata Manual (http://www.stata.com/manuals13/xtxtset.pdf) we have a short description of an unbalanced panel data:

The terms balanced and unbalanced are often used to describe whether a panel dataset is missing some observations. If a dataset does not contain a time variable, then panels are considered balanced if each panel contains the same number of observations; otherwise, the panels are unbalanced. When the dataset contains a time variable, panels are said to be strongly balanced if each panel contains the same time points, weakly balanced if each panel contains the same number of observations but not the same time points, and unbalanced otherwise.

Last edited by Marcos Almeida; 30 Sep 2015, 07:47.

Best regards,

Marcos
1 like
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4389
#3

30 Sep 2015, 20:18

It looks like your best bet would be to use Stata's Structural Equation Modeling (SEM) capabilities. I've shown an example below (start at the "Begin here" comment; the beginning of the do-file just creates a dataset for illustration, one that simulates the characteristics of the one you describe, i.e., a thousand patients with repeated measurements of two clinical laboratory analytes, time intervals not the same for each patient, number of measurements not the same for each patient).

I've attached the do-file, itself, so that you can use it to play around with the degree of correlation between the two analyte values—just vary the value of the local macro named covariance. You can see what happens to the covariance (and the variances) for the two analytes (value1 and value2 in the illustrative example).

In the illustration, I've used a simple linear relationship between time and the analytes' values. You can change that (splines, random slopes etc.) to suit your data.

One thing that you'll need to keep in mind is that typical clinical laboratory test results will often be distributed nonnormally, especially if something is going on with the patient's health. If that's what's happening in your data, then you might need to look into transformations (e.g., logarithmic) if you want to use a linear model as below (in terms of the user's manual, an "uncensored gaussian response with the identity link"). An alternative is to use a different distribution family, a different link or both in the gsem model, but specifying a random factor to measure association between the two analytes' values becomes a bit trickier.

.ÿversionÿ14.0

.ÿ
.ÿclearÿ*

.ÿsetÿmoreÿoff

.ÿsetÿseedÿ`=date("2015-10-01",ÿ"YMD")'

.ÿ
.ÿ//ÿCovarianceÿisÿassignedÿtoÿlocalÿmacroÿbelow
.ÿlocalÿcovarianceÿ2

.ÿ
.ÿquietlyÿsetÿobsÿ1000

.ÿ
.ÿgenerateÿintÿpidÿ=ÿ_n

.ÿgenerateÿdoubleÿpid_uÿ=ÿrnormal()

.ÿ
.ÿlocalÿaÿ1

.ÿlocalÿbÿ300

.ÿ
.ÿforvaluesÿtime_pointÿ=ÿ1/5ÿ{
ÿÿ2.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿtime_point_u`time_point'ÿ=ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿsqrt(`covariance')ÿ*ÿrnormal()ÿ//ÿ<-ÿthisÿisÿtheÿcommonÿvarianceÿ(i.e.,ÿcovariance)
ÿÿ3.ÿÿÿÿÿÿÿÿÿgenerateÿintÿtime`time_point'ÿ=ÿfloor((`b'ÿ-ÿ`a'ÿ+ÿ1)ÿ*ÿruniform()ÿ+ÿ`a')
ÿÿ4.ÿÿÿÿÿÿÿÿÿlocalÿvarlistÿ`varlist'ÿvalue`time_point'
ÿÿ5.ÿÿÿÿÿÿÿÿÿforvaluesÿanalyteÿ=ÿ1/2ÿ{
ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿvalue`time_point'`analyte'ÿ=ÿ///
>ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿpid_uÿ+ÿtime_point_u`time_point'ÿ+ÿ`analyte'ÿ*ÿrnormal()
ÿÿ7.ÿÿÿÿÿÿÿÿÿ}
ÿÿ8.ÿ}

.ÿ
.ÿquietlyÿreshapeÿlongÿ`varlist',ÿi(pid)ÿj(analyte)

.ÿquietlyÿreshapeÿlongÿtime_point_uÿtimeÿvalue,ÿi(pidÿanalyte)ÿj(time_point)

.ÿquietlyÿreshapeÿwideÿvalue,ÿi(pidÿtime_point)ÿj(analyte)

.ÿ
.ÿquietlyÿreplaceÿtimeÿ=ÿ0ÿifÿtime_pointÿ==ÿ1ÿ//ÿAllÿpatientsÿhaveÿtimeÿzeroÿinÿcommon

.ÿquietlyÿdropÿifÿruniform()ÿ<ÿ0.05ÿ//ÿPatientsÿdoÿnotÿallÿhaveÿtheÿsameÿnumberÿofÿobservations

.ÿ
.ÿ*
.ÿ*ÿBeginÿhere
.ÿ*
.ÿgsemÿ(value1ÿvalue2ÿ<-ÿtimeÿM1[pid]),ÿcovariance(e.value1*e.value2)ÿnolog

GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ4,740

Responseÿÿÿÿÿÿÿ:ÿvalue1
Familyÿÿÿÿÿÿÿÿÿ:ÿGaussian
Linkÿÿÿÿÿÿÿÿÿÿÿ:ÿidentity

Responseÿÿÿÿÿÿÿ:ÿvalue2
Familyÿÿÿÿÿÿÿÿÿ:ÿGaussian
Linkÿÿÿÿÿÿÿÿÿÿÿ:ÿidentity

Logÿlikelihoodÿ=ÿ-20250.302

ÿ(ÿ1)ÿÿ[value1]M1[pid]ÿ=ÿ1
---------------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
----------------------+----------------------------------------------------------------
value1ÿ<-ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿtimeÿ|ÿÿÿ.0001861ÿÿÿÿ.000269ÿÿÿÿÿ0.69ÿÿÿ0.489ÿÿÿÿ-.0003412ÿÿÿÿ.0007135
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿM1[pid]ÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿÿ-.093744ÿÿÿ.0527168ÿÿÿÿ-1.78ÿÿÿ0.075ÿÿÿÿÿ-.197067ÿÿÿÿÿ.009579
----------------------+----------------------------------------------------------------
value2ÿ<-ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿtimeÿ|ÿÿÿ.0006858ÿÿÿÿ.000373ÿÿÿÿÿ1.84ÿÿÿ0.066ÿÿÿÿ-.0000452ÿÿÿÿ.0014169
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿM1[pid]ÿ|ÿÿÿ.9612136ÿÿÿ.0399843ÿÿÿÿ24.04ÿÿÿ0.000ÿÿÿÿÿ.8828459ÿÿÿÿ1.039581
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1093078ÿÿÿ.0655745ÿÿÿÿ-1.67ÿÿÿ0.096ÿÿÿÿ-.2378315ÿÿÿÿ.0192158
----------------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿÿvar(M1[pid])|ÿÿÿÿ1.09545ÿÿÿ.0790476ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.9509768ÿÿÿÿ1.261872
----------------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿvar(e.value1)|ÿÿÿ3.037117ÿÿÿÿ.070184ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ2.902627ÿÿÿÿ3.177838
ÿÿÿÿÿÿÿÿÿvar(e.value2)|ÿÿÿ6.106722ÿÿÿ.1363088ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ5.845321ÿÿÿÿ6.379812
----------------------+----------------------------------------------------------------
cov(e.value2,e.value1)|ÿÿÿ2.087412ÿÿÿ.0778129ÿÿÿÿ26.83ÿÿÿ0.000ÿÿÿÿÿ1.934901ÿÿÿÿ2.239922
---------------------------------------------------------------------------------------

.ÿ
.ÿ/*ÿReferenceÿforÿgeneratingÿrandomÿintegersÿ(usedÿtoÿmakeÿtimesÿofÿmeasurementÿdifferentÿforÿeachÿpatient):
>ÿÿÿÿhttp://blog.stata.com/tag/random-numbers/ÿ*/
.ÿ
.ÿÿexit

endÿofÿdo-file

.
Attached Files

Brimble.do (1.3 KB, 1 view)
1 like
Comment
Scott Brimble

Join Date: Sep 2015

Posts: 2
#4

01 Oct 2015, 06:26

Thank you - that is most helpful! One of the lab variables is indeed non-normally distributed and I would use log transformed values. The second variable is slightly skewed but for my purposes I can leave it non-transformed. Thanks again!
Comment

Announcement

Correlation between repeated measures over time

Comment

Comment

Comment