Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Correlation between repeated measures over time

    Hello – I am new to Stata (14.0). I have an observational dataset of about 1,000 subjects with repeated measures over time of two lab variables and I am interested in whether or not they are correlated over time to help inform some simulation scenarios. The time intervals are not the same for each patient – i.e. subj1 may have values at t=0, 30, 126, 300 etc., subj2 t=0, 55, 200 etc. The number of measures per subject also varies. I am uncertain what would be the best approach to this. Thanks for considering.

  • #2
    Welcome to the Forum.

    To me, it seems like an unbalanced panel data. However, for time series, that would depend on the variables and field of research as well.

    In the Stata Manual (http://www.stata.com/manuals13/xtxtset.pdf) we have a short description of an unbalanced panel data:

    The terms balanced and unbalanced are often used to describe whether a panel dataset is missing some observations. If a dataset does not contain a time variable, then panels are considered balanced if each panel contains the same number of observations; otherwise, the panels are unbalanced. When the dataset contains a time variable, panels are said to be strongly balanced if each panel contains the same time points, weakly balanced if each panel contains the same number of observations but not the same time points, and unbalanced otherwise.
    Last edited by Marcos Almeida; 30 Sep 2015, 07:47.
    Best regards,

    Marcos

    Comment


    • #3
      It looks like your best bet would be to use Stata's Structural Equation Modeling (SEM) capabilities. I've shown an example below (start at the "Begin here" comment; the beginning of the do-file just creates a dataset for illustration, one that simulates the characteristics of the one you describe, i.e., a thousand patients with repeated measurements of two clinical laboratory analytes, time intervals not the same for each patient, number of measurements not the same for each patient).

      I've attached the do-file, itself, so that you can use it to play around with the degree of correlation between the two analyte values—just vary the value of the local macro named covariance. You can see what happens to the covariance (and the variances) for the two analytes (value1 and value2 in the illustrative example).

      In the illustration, I've used a simple linear relationship between time and the analytes' values. You can change that (splines, random slopes etc.) to suit your data.

      One thing that you'll need to keep in mind is that typical clinical laboratory test results will often be distributed nonnormally, especially if something is going on with the patient's health. If that's what's happening in your data, then you might need to look into transformations (e.g., logarithmic) if you want to use a linear model as below (in terms of the user's manual, an "uncensored gaussian response with the identity link"). An alternative is to use a different distribution family, a different link or both in the gsem model, but specifying a random factor to measure association between the two analytes' values becomes a bit trickier.

      .ÿversionÿ14.0

      .ÿ
      .ÿclearÿ*

      .ÿsetÿmoreÿoff

      .ÿsetÿseedÿ`=date("2015-10-01",ÿ"YMD")'

      .ÿ
      .ÿ//ÿCovarianceÿisÿassignedÿtoÿlocalÿmacroÿbelow
      .ÿlocalÿcovarianceÿ2

      .ÿ
      .ÿquietlyÿsetÿobsÿ1000

      .ÿ
      .ÿgenerateÿintÿpidÿ=ÿ_n

      .ÿgenerateÿdoubleÿpid_uÿ=ÿrnormal()

      .ÿ
      .ÿlocalÿaÿ1

      .ÿlocalÿbÿ300

      .ÿ
      .ÿforvaluesÿtime_pointÿ=ÿ1/5ÿ{
      ÿÿ2.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿtime_point_u`time_point'ÿ=ÿ///
      >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿsqrt(`covariance')ÿ*ÿrnormal()ÿ//ÿ<-ÿthisÿisÿtheÿcommonÿvarianceÿ(i.e.,ÿcovariance)
      ÿÿ3.ÿÿÿÿÿÿÿÿÿgenerateÿintÿtime`time_point'ÿ=ÿfloor((`b'ÿ-ÿ`a'ÿ+ÿ1)ÿ*ÿruniform()ÿ+ÿ`a')
      ÿÿ4.ÿÿÿÿÿÿÿÿÿlocalÿvarlistÿ`varlist'ÿvalue`time_point'
      ÿÿ5.ÿÿÿÿÿÿÿÿÿforvaluesÿanalyteÿ=ÿ1/2ÿ{
      ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿvalue`time_point'`analyte'ÿ=ÿ///
      >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿpid_uÿ+ÿtime_point_u`time_point'ÿ+ÿ`analyte'ÿ*ÿrnormal()
      ÿÿ7.ÿÿÿÿÿÿÿÿÿ}
      ÿÿ8.ÿ}

      .ÿ
      .ÿquietlyÿreshapeÿlongÿ`varlist',ÿi(pid)ÿj(analyte)

      .ÿquietlyÿreshapeÿlongÿtime_point_uÿtimeÿvalue,ÿi(pidÿanalyte)ÿj(time_point)

      .ÿquietlyÿreshapeÿwideÿvalue,ÿi(pidÿtime_point)ÿj(analyte)

      .ÿ
      .ÿquietlyÿreplaceÿtimeÿ=ÿ0ÿifÿtime_pointÿ==ÿ1ÿ//ÿAllÿpatientsÿhaveÿtimeÿzeroÿinÿcommon

      .ÿquietlyÿdropÿifÿruniform()ÿ<ÿ0.05ÿ//ÿPatientsÿdoÿnotÿallÿhaveÿtheÿsameÿnumberÿofÿobservations

      .ÿ
      .ÿ*
      .ÿ*ÿBeginÿhere
      .ÿ*
      .ÿgsemÿ(value1ÿvalue2ÿ<-ÿtimeÿM1[pid]),ÿcovariance(e.value1*e.value2)ÿnolog

      GeneralizedÿstructuralÿequationÿmodelÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿ4,740

      Responseÿÿÿÿÿÿÿ:ÿvalue1
      Familyÿÿÿÿÿÿÿÿÿ:ÿGaussian
      Linkÿÿÿÿÿÿÿÿÿÿÿ:ÿidentity

      Responseÿÿÿÿÿÿÿ:ÿvalue2
      Familyÿÿÿÿÿÿÿÿÿ:ÿGaussian
      Linkÿÿÿÿÿÿÿÿÿÿÿ:ÿidentity

      Logÿlikelihoodÿ=ÿ-20250.302

      ÿ(ÿ1)ÿÿ[value1]M1[pid]ÿ=ÿ1
      ---------------------------------------------------------------------------------------
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|ÿÿÿÿÿÿCoef.ÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
      ----------------------+----------------------------------------------------------------
      value1ÿ<-ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿtimeÿ|ÿÿÿ.0001861ÿÿÿÿ.000269ÿÿÿÿÿ0.69ÿÿÿ0.489ÿÿÿÿ-.0003412ÿÿÿÿ.0007135
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿM1[pid]ÿ|ÿÿÿÿÿÿÿÿÿÿ1ÿÿ(constrained)
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿÿ-.093744ÿÿÿ.0527168ÿÿÿÿ-1.78ÿÿÿ0.075ÿÿÿÿÿ-.197067ÿÿÿÿÿ.009579
      ----------------------+----------------------------------------------------------------
      value2ÿ<-ÿÿÿÿÿÿÿÿÿÿÿÿÿ|
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿtimeÿ|ÿÿÿ.0006858ÿÿÿÿ.000373ÿÿÿÿÿ1.84ÿÿÿ0.066ÿÿÿÿ-.0000452ÿÿÿÿ.0014169
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿM1[pid]ÿ|ÿÿÿ.9612136ÿÿÿ.0399843ÿÿÿÿ24.04ÿÿÿ0.000ÿÿÿÿÿ.8828459ÿÿÿÿ1.039581
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ|
      ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ_consÿ|ÿÿ-.1093078ÿÿÿ.0655745ÿÿÿÿ-1.67ÿÿÿ0.096ÿÿÿÿ-.2378315ÿÿÿÿ.0192158
      ----------------------+----------------------------------------------------------------
      ÿÿÿÿÿÿÿÿÿÿvar(M1[pid])|ÿÿÿÿ1.09545ÿÿÿ.0790476ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ.9509768ÿÿÿÿ1.261872
      ----------------------+----------------------------------------------------------------
      ÿÿÿÿÿÿÿÿÿvar(e.value1)|ÿÿÿ3.037117ÿÿÿÿ.070184ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ2.902627ÿÿÿÿ3.177838
      ÿÿÿÿÿÿÿÿÿvar(e.value2)|ÿÿÿ6.106722ÿÿÿ.1363088ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ5.845321ÿÿÿÿ6.379812
      ----------------------+----------------------------------------------------------------
      cov(e.value2,e.value1)|ÿÿÿ2.087412ÿÿÿ.0778129ÿÿÿÿ26.83ÿÿÿ0.000ÿÿÿÿÿ1.934901ÿÿÿÿ2.239922
      ---------------------------------------------------------------------------------------

      .ÿ
      .ÿ/*ÿReferenceÿforÿgeneratingÿrandomÿintegersÿ(usedÿtoÿmakeÿtimesÿofÿmeasurementÿdifferentÿforÿeachÿpatient):
      >ÿÿÿÿhttp://blog.stata.com/tag/random-numbers/ÿ*/
      .ÿ
      .ÿÿexit

      endÿofÿdo-file


      .
      Attached Files

      Comment


      • #4
        Thank you - that is most helpful! One of the lab variables is indeed non-normally distributed and I would use log transformed values. The second variable is slightly skewed but for my purposes I can leave it non-transformed. Thanks again!

        Comment

        Working...
        X