Visualizing results after running linear piece-wise spline modelling

Amal Khanolkar

Join Date: Feb 2015
Posts: 142

Visualizing results after running linear piece-wise spline modelling

09 Apr 2018, 09:16

Hi All

I'm analyzing the association between y (a clinical variable) and time (time since diagnosis of diabetes), adjusting for gender and at age at diagnosis. We already know from existing literature and the raw data, that y has a non-linear relationship with time since diagnosis. I've fit a multilevel piece-wise linear spline model to account for the non-linearity of the data with knots at 1, 2, 4, 8, 12, 17, 24 & 34 months after diagnosis (total follow-up time 72 months). The data is longitudinal with subjects having multiple measurements of Y, so essentially this is a growth curve analysis (random slope & random coefficient model):

Code:

mkspline durm1 1 durm2 2 durm4 4 durm8 8 durm12 12 durm17 17 durm24 24 durm34 34 durm36 = durationm
 
mixed y durm1 durm2 durm4 durm8 durm12 durm17 durm24 durm34 durm36 sexnew diagagenew || id: durationm, cov(unstr) mle, if diabtype==1 & audityr>2010 & durationall<6.1
 
Performing EM optimization:
 
Performing gradient-based optimization:
 
Iteration 0:   log likelihood = -739203.39 
Iteration 1:   log likelihood = -739203.15 
Iteration 2:   log likelihood = -739203.15 
 
Computing standard errors:
 
Mixed-effects ML regression                     Number of obs     =    187,201
Group variable: id                              Number of groups  =     25,677
 
                                                Obs per group:
                                                              min =          1
                                                              avg =        7.3
                                                              max =         36
 
                                                Wald chi2(11)     =   36063.34
Log likelihood = -739203.15                     Prob > chi2       =     0.0000
 
------------------------------------------------------------------------------
             y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       durm1 |   -18.9578   .2976129   -63.70   0.000    -19.54111   -18.37449
       durm2 |  -11.29657   .3486496   -32.40   0.000    -11.97991   -10.61323
       durm4 |  -4.899221   .1451236   -33.76   0.000    -5.183658   -4.614784
       durm8 |    1.85443   .0570494    32.51   0.000     1.742615    1.966245
      durm12 |    .414913   .0476722     8.70   0.000     .3214771    .5083489
      durm17 |   .2890149   .0343775     8.41   0.000     .2216362    .3563936
      durm24 |   .2048707   .0219777     9.32   0.000     .1617952    .2479462
      durm34 |   .1576019   .0133097    11.84   0.000     .1315154    .1836883
      durm36 |   .0369482   .0055738     6.63   0.000     .0260238    .0478727
      sexnew |   1.399622   .1986482     7.05   0.000     1.010279    1.788965
  diagagenew |   .4907482   .0241491    20.32   0.000     .4434168    .5380795
       _cons |   91.32118   .4392206   207.92   0.000     90.46032    92.18204
------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Unstructured             |
               var(durati~m) |   .1717863   .0028512      .1662881    .1774663
                  var(_cons) |   262.7687   3.711764      255.5935    270.1452
         cov(durati~m,_cons) |  -3.667118   .0833349     -3.830452   -3.503785
-----------------------------+------------------------------------------------
               var(Residual) |   99.06962   .3773554      98.33277    99.81199
------------------------------------------------------------------------------
LR test vs. linear model: chi2(3) = 1.3e+05               Prob > chi2 = 0.0000
 
Note: LR test is conservative and provided only for reference.

I would now like to visualize the data post regression - plot y with time since diagnosis first for all subjects and then separately for males and females. What is the best way to do this?

I've been told margins and marginsplot (which I generally use) doesn't work too well for linear spline models. Is this true? There is a user written package to help plot data post regression when using restricted cubic splines (the 'postrcspline' package).

Is there a way to plot what I would like to see? Due to the nature of the data, it is stored on a secure online server and I'm unable to download and run any user-written commands which complicates matters.....

Thanks!
/Amal

Tags: None

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#2

09 Apr 2018, 12:58

Before you proceed, your model is mis-specified. You should not attempt to interpret or plot anything from these results.

The problem is that you have asked Stata to fit random slopes on the variable durationm, but durationm does not appear among the fixed effects in the model. Consequently your random slopes are calculated subject to the constraint that the mean slope on durationm is zero, which I suspect is not even remotely realistic. I realize that durationm is indirectly represented in the fixed effects by the sline variables durm*. But that does not do the trick needed here.

I think what you need to do is include random slopes on the durm* variables instead of on durationm.

Note, however, that with 9 such random slopes, the use of cov(unstr) is going to force Stata to try to estimate a 9x9 covariance matrix. That's 81 extra parameters in your model. While your data set looks like it may be large enough to support that, it is going to take a very long time to do all that computing. Are you really interested in these 81 parameters? (I'm guessing you're not, given that you didn't even plan to include them in the model.) If not, I would simplify the model to just estimate cov(independent) or, if you have some interest in the approximate covariance among these random slopes, perhaps cov(exchangeable).
Comment

Amal Khanolkar

Join Date: Feb 2015
Posts: 142

10 Apr 2018, 08:43

Hi Clyde

Thanks so much - What you say make total sense and the spline variables (durm1 to durm36) should be included in the random part of the model. It substantially increase the number parameters to be tested and with nine knots (plus other categorical variables that I will eventually add to model in addition to interactions) makes things quite complicated. So, I decided to take a step back and keep things simple for now. I've run two 'simpler' random intercept only models with few knots - 1. One model for the overall trajectory of y and 2. the same model but including interactions with gender (a likelihood ratio test shows the model with interactions has a significantly better fit. Girls have a higher y at at time 0 and marginally higher during follow-up):

Code:

 
 mkspline durm2 2 durm8 8 durm16 16 durm28 28 durm38 = durationm eststo: mixed hba1cmol durm2 durm8 durm16 durm28 durm38 sexnew diagagenew || id:, cov(unstr) mle, if diabtype==1 & audityr>2010 & durationall<6.1 & ethnicnew!=.
Note: single-variable random-effects specification in id equation; covariance structure set to identity
 
Performing EM optimization:
 
Performing gradient-based optimization:
 
Iteration 0:   log likelihood = -712484.96 
Iteration 1:   log likelihood = -712484.96 
 
Computing standard errors:
 
Mixed-effects ML regression                     Number of obs     =    179,066
Group variable: id                              Number of groups  =     22,693
 
                                                Obs per group:
                                                              min =          1
                                                              avg =        7.9
                                                              max =         36
 
                                                Wald chi2(7)      =   21093.83
Log likelihood = -712484.96                     Prob > chi2       =     0.0000
 
------------------------------------------------------------------------------
    hba1cmol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
       durm2 |  -16.11096   .1356142  -118.80   0.000    -16.37676   -15.84516
       durm8 |   .0841984   .0338622     2.49   0.013     .0178297    .1505672
      durm16 |   .5613997   .0200162    28.05   0.000     .5221687    .6006307
      durm28 |   .1208665   .0103057    11.73   0.000     .1006677    .1410654
      durm38 |    .039808   .0033696    11.81   0.000     .0332037    .0464123
      sexnew |   1.486144   .2000963     7.43   0.000     1.093963    1.878326
  diagagenew |   .5903004   .0248777    23.73   0.000      .541541    .6390598
       _cons |   89.12814   .4500975   198.02   0.000     88.24597    90.01032
------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   201.6421   2.159983      197.4528    205.9204
-----------------------------+------------------------------------------------
               var(Residual) |   122.4032   .4388346      121.5462    123.2664
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 1.1e+05       Prob >= chibar2 = 0.0000
(est1 stored)
 
. est store a
 
. estat ic
 
Akaike's information criterion and Bayesian information criterion
 
-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           a |    179,066         .    -712485      10     1424990    1425091
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.
 
. estat icc
 
Residual intraclass correlation
 
------------------------------------------------------------------------------
                       Level |        ICC   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------

                          id |   .6222651   .0026972      .6169643    .6275369

 
. eststo: mixed hba1cmol c.durm2##i.sexnew c.durm8##i.sexnew c.durm16##i.sexnew c.durm28##i.sexnew c.durm38##i.sexnew diagagenew || id:, cov(unstr) mle, if diabtype==1 & audityr>2010 & durationall<6.1 & ethnicnew!=.
Note: single-variable random-effects specification in id equation; covariance structure set to identity
 
Performing EM optimization:
 
Performing gradient-based optimization:
 
Iteration 0:   log likelihood = -712453.89 
Iteration 1:   log likelihood = -712453.89 
 
Computing standard errors:
 
Mixed-effects ML regression                     Number of obs     =    179,066
Group variable: id                              Number of groups  =     22,693
 
                                                Obs per group:
                                                              min =          1
                                                              avg =        7.9
                                                              max =         36
 
                                                Wald chi2(12)     =   21163.73
Log likelihood = -712453.89                     Prob > chi2       =     0.0000
 
---------------------------------------------------------------------------------
       hba1cmol |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
----------------+----------------------------------------------------------------
          durm2 |  -15.71934   .1862358   -84.41   0.000    -16.08436   -15.35433
                |
         sexnew |
        female  |   4.098497   .4847686     8.45   0.000     3.148368    5.048626
                |
 sexnew#c.durm2 |
        female  |   -.827848   .2716672    -3.05   0.002    -1.360306   -.2953901
                |
          durm8 |   .1450891   .0464176     3.13   0.002     .0541122    .2360659
                |
 sexnew#c.durm8 |
        female  |   -.132021    .067851    -1.95   0.052    -.2650065    .0009645
                |
         durm16 |   .6030136    .027526    21.91   0.000     .5490637    .6569635
                |
sexnew#c.durm16 |
        female  |   -.087893   .0400893    -2.19   0.028    -.1664667   -.0093194
                |
         durm28 |   .1035275   .0141773     7.30   0.000     .0757405    .1313144
                |
sexnew#c.durm28 |
        female  |    .036824   .0206396     1.78   0.074    -.0036289    .0772768
                |
         durm38 |   .0397364   .0046462     8.55   0.000     .0306301    .0488428
                |
sexnew#c.durm38 |
        female  |   .0001662   .0067207     0.02   0.980    -.0130061    .0133385
                |
     diagagenew |   .5903152   .0248798    23.73   0.000     .5415516    .6390787
          _cons |   89.38644   .4054406   220.47   0.000     88.59179    90.18109
---------------------------------------------------------------------------------
 
------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
id: Identity                 |
                  var(_cons) |   201.6354   2.159868      197.4463    205.9134
-----------------------------+------------------------------------------------
               var(Residual) |   122.3559   .4386657      121.4991    123.2187
------------------------------------------------------------------------------
LR test vs. linear model: chibar2(01) = 1.1e+05       Prob >= chibar2 = 0.0000
(est1 stored)
 
. est store b
. estat ic
 
Akaike's information criterion and Bayesian information criterion
 
-----------------------------------------------------------------------------
       Model |        Obs  ll(null)  ll(model)      df         AIC        BIC
-------------+---------------------------------------------------------------
           b |    179,066         .  -712453.9      15     1424938    1425089
-----------------------------------------------------------------------------
               Note: N=Obs used in calculating BIC; see [R] BIC note.
 
. estat icc
 
Residual intraclass correlation
 
------------------------------------------------------------------------------
                       Level |        ICC   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
                          id |   .6223482   .0026969       .617048    .6276194
------------------------------------------------------------------------------
 
. lrtest a b
 
Likelihood-ratio test                                 LR chi2(5)  =     62.14
(Assumption: a nested in b)                           Prob > chi2 =    0.0000

Do you have suggestions on the best way to plot the above? And whether margins are not suitable for linear spline models?

Thanks!
/Amal

Comment

Clyde Schechter

Join Date: Apr 2014

Posts: 30118
#4

10 Apr 2018, 09:19

So, the challenge is that you need the values of durm* corresponding to interesting values of duration that you want to use along the horizontal axis. The easiest case is if the values you choose are values that actually occur in your data set. So, I'm going to assume that you actually have observations for which duration = 2, 8, 16, 28, and 38. Then you can capture the corresponding values of the spline variables as follows:

Code:

foreach d or numlist 2 8 16 28 38 { local at_`d' foreach v of varlist durm* { summ `v' if duration == `d', meanonly local at_`d' `at_`d'' `v' = `r(mean)' } } margins sexnew, at(`at_2') at(`at_8') at(`at_16') (`at_28') (`at_38') saving(margins_output, replace)

Then you want to use the new margins_output.dta data set and generate a variable, duration, that corresponds to the corresponding sets of values of durm*. Then you can use regular graphing commands.

Note: Not tested, there may be typos or other errors here, but this is the gist of it.

I don't know of a simpler way to do this, though there may be one. If somebody else does, I hope he or she will chime in.
Comment

Announcement

Visualizing results after running linear piece-wise spline modelling

Comment

Comment

Comment