I am trying to model the intensity of a care intervention for an individual (measured in time) as a function of time that a person has been known to a service.
The data has observations on 1766 individuals, with a mean of 4.3 observations per person, which are structured as follows.
Each id has between 1 and 8 values for intensity. All values for each individual are contiguous, reflecting the time during the 4 years (8 snapshots at six-month intervals) that they were known to the service. Each snapshot time has around 1,000 observed values.
The value of t, the snapshot time, is arbitrary as far as the individual is concerned so I am using the months they have been open to the service as the predictor variable for time. This means that although the measurement occasions were fixed in time, the values of my time variable (mo1, mo2, mo3) vary across individuals at those times. For this reason, I want to run this as a mixed model, with random time effects to reflect unobserved individual factors, and add some fixed covariates related to demographics later.
The variance of the intensity varies across time, and the correlation between observations is greater when observations are closer together in time, so I am using the covariance(unstructured) option. This works unproblematically for a fixed quadratic model with a random linear slope, i.e.
\[ y_{ij} = \beta_{0j} + \beta_{1j} mo1 + \beta_2 mo2 + e_{ij} \\
\beta_{0j} = \beta_0 + u_{0j} \\
\beta_{1j} = \beta_1 + u_{1j} \]
Or in Stata:
That produces this output:
However, a likelihood ratio test demonstrates that this model is not preferable over a linear model without the quadratic fixed effect. Intensity is not linear for individuals over time, so I suspect this is because a fixed quadratic effect cannot capture the large variation in intensity. I tried adding a full quadratic random effect, i.e.:
\[ y_{ij} = \beta_{0j} + \beta_{1j} mo1 + \beta_{2j} mo2 + e_ij \\
\beta_{0j} = \beta_0 + u_{0j} \\
\beta_{1j} = \beta_1 + u_{1j} \\
\beta_{2j} = \beta_1 + u_{2j} \]
Or in Stata code:
However, I am confused by the output:
The model converges after 4 iterations, and I do not get an error that Stata is unable to produce a standard error for the random parts of the model. However, the standard errors are missing.
I cannot use lrtest as the standard errors are missing, but compared to the previous model 2* change in log-likehood is 137.304, on 3 degrees of freedom, which would indicate that this model is preferable to the previous one.
However, can I use this log-likehood figure? My sense is that the missing standard errors mean I cannot use the model - that somehow it has failed - but I do not really understand it.
I am pretty sure that this is happening because of the use of the months open as the time variable. If I replace it with t, the snapshot period, then I do not have this issue. However, I have removed the number of months with very few observations.
But I still get the same issue. What is going on here? What do the missing standard errors actually mean? Is my diagnosis on the right lines? Is there a way of keeping a meaningful time variable and avoiding this issue? I'd be very grateful if anyone could shed any light on this.
The data has observations on 1766 individuals, with a mean of 4.3 observations per person, which are structured as follows.
- id: Person ID
- intensity: Dependent variable
- t: Time of snapshot (taken at six monthly intervals)
- months_open_60: Months individual has been open to service at time of snapshot (mean-centred, has negative values)
- mo1, mo2, mo3: Orthogonal linear, quadratic and cubic polynomials for months_open_60.
Code:
+----------------------------------------------------------------------+ | id intensity t month~60 mo1 mo2 mo3 | |----------------------------------------------------------------------| 1. | 1 9.6994724 1 12 .45757686 -.72035168 -.84513099 | 2. | 1 8.8730478 2 18 .62540256 -.5047519 -.94836488 | 3. | 1 9.3810953 3 24 .79322826 -.231857 -.93265565 | 4. | 1 9.0255758 4 30 .96105396 .09833303 -.7712714 | 5. | 1 8.7406567 5 36 1.1288797 .48581818 -.43748023 | |----------------------------------------------------------------------| 6. | 1 9.0777228 6 42 1.2967054 .93059846 .09544976 | 7. | 1 7.1115121 7 48 1.4645311 1.4326739 .85425047 | 8. | 1 . 8 54 1.6323568 1.9920444 1.8656538 | 9. | 2 8.908965 1 6 .28975116 -.87865634 -.64968587 | 10. | 2 8.6251503 2 12 .45757686 -.72035168 -.84513099 |
The value of t, the snapshot time, is arbitrary as far as the individual is concerned so I am using the months they have been open to the service as the predictor variable for time. This means that although the measurement occasions were fixed in time, the values of my time variable (mo1, mo2, mo3) vary across individuals at those times. For this reason, I want to run this as a mixed model, with random time effects to reflect unobserved individual factors, and add some fixed covariates related to demographics later.
The variance of the intensity varies across time, and the correlation between observations is greater when observations are closer together in time, so I am using the covariance(unstructured) option. This works unproblematically for a fixed quadratic model with a random linear slope, i.e.
\[ y_{ij} = \beta_{0j} + \beta_{1j} mo1 + \beta_2 mo2 + e_{ij} \\
\beta_{0j} = \beta_0 + u_{0j} \\
\beta_{1j} = \beta_1 + u_{1j} \]
Or in Stata:
Code:
mixed intensity mo1 mo2 || id: mo1, mle variance covariance(unstructured)
Code:
Performing gradient-based optimization: Iteration 0: log likelihood = -13511.573 Iteration 1: log likelihood = -13507.184 Iteration 2: log likelihood = -13507.15 Iteration 3: log likelihood = -13507.15 Computing standard errors ... Mixed-effects ML regression Number of obs = 7,600 Group variable: id Number of groups = 1,760 Obs per group: min = 1 avg = 4.3 max = 8 Wald chi2(2) = 14.92 Log likelihood = -13507.15 Prob > chi2 = 0.0006 ------------------------------------------------------------------------------ intensity | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- mo1 | -.0823819 .0255335 -3.23 0.001 -.1324266 -.0323373 mo2 | .0267544 .0197757 1.35 0.176 -.0120051 .065514 _cons | 7.256383 .0271316 267.45 0.000 7.203206 7.30956 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects parameters | Estimate Std. err. [95% conf. interval] -----------------------------+------------------------------------------------ id: Unstructured | var(mo1) | .2055391 .0388611 .1418922 .2977355 var(_cons) | .6037068 .0448357 .5219268 .6983007 cov(mo1,_cons) | .0787516 .0252527 .0292572 .128246 -----------------------------+------------------------------------------------ var(Residual) | 1.58523 .0308387 1.525925 1.64684 ------------------------------------------------------------------------------ LR test vs. linear model: chi2(3) = 928.23 Prob > chi2 = 0.0000
\[ y_{ij} = \beta_{0j} + \beta_{1j} mo1 + \beta_{2j} mo2 + e_ij \\
\beta_{0j} = \beta_0 + u_{0j} \\
\beta_{1j} = \beta_1 + u_{1j} \\
\beta_{2j} = \beta_1 + u_{2j} \]
Or in Stata code:
Code:
mixed intensity mo1 mo2 || id: mo1 mo2, mle variance covariance(unstructured)
Code:
Performing gradient-based optimization: Iteration 0: log likelihood = -13626.217 Iteration 1: log likelihood = -13605.349 Iteration 2: log likelihood = -13601.943 Iteration 3: log likelihood = -13601.464 Iteration 4: log likelihood = -13601.461 Iteration 5: log likelihood = -13601.461 Computing standard errors ... Mixed-effects ML regression Number of obs = 7,692 Group variable: id Number of groups = 1,765 Obs per group: min = 1 avg = 4.4 max = 8 Wald chi2(2) = 11.98 Log likelihood = -13601.461 Prob > chi2 = 0.0025 ------------------------------------------------------------------------------ intensity | Coefficient Std. err. z P>|z| [95% conf. interval] -------------+---------------------------------------------------------------- mo1 | -.0789441 .0252904 -3.12 0.002 -.1285123 -.0293759 mo2 | .0151725 .0221992 0.68 0.494 -.0283372 .0586822 _cons | 7.246859 .0271523 266.90 0.000 7.193641 7.300077 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects parameters | Estimate Std. err. [95% conf. interval] -----------------------------+------------------------------------------------ id: Unstructured | var(mo1) | .478652 . . . var(mo2) | .1620386 . . . var(_cons) | .3203482 . . . cov(mo1,mo2) | .0301161 . . . cov(mo1,_cons) | .06083 . . . cov(mo2,_cons) | -.219922 . . . -----------------------------+------------------------------------------------ var(Residual) | 1.445289 . . . ------------------------------------------------------------------------------ LR test vs. linear model: chi2(6) = 1086.28 Prob > chi2 = 0.0000
I cannot use lrtest as the standard errors are missing, but compared to the previous model 2* change in log-likehood is 137.304, on 3 degrees of freedom, which would indicate that this model is preferable to the previous one.
However, can I use this log-likehood figure? My sense is that the missing standard errors mean I cannot use the model - that somehow it has failed - but I do not really understand it.
I am pretty sure that this is happening because of the use of the months open as the time variable. If I replace it with t, the snapshot period, then I do not have this issue. However, I have removed the number of months with very few observations.
Code:
. tab months_open60 months_open | 60 | Freq. Percent Cum. ------------+----------------------------------- -54 | 481 6.49 6.49 -48 | 465 6.27 12.76 -42 | 436 5.88 18.64 -36 | 402 5.42 24.06 -30 | 384 5.18 29.23 -24 | 360 4.85 34.09 -18 | 365 4.92 39.01 -12 | 355 4.79 43.80 -6 | 357 4.81 48.61 0 | 349 4.71 53.32 6 | 399 5.38 58.70 12 | 467 6.30 64.99 18 | 545 7.35 72.34 24 | 507 6.84 79.18 30 | 462 6.23 85.41 36 | 416 5.61 91.02 42 | 362 4.88 95.90 48 | 304 4.10 100.00 ------------+----------------------------------- Total | 7,416 100.00
Comment