3-level ordinal outcome measured at 3 time points (repeated measures)

Bruce Weaver

Join Date: May 2014

Posts: 1134
#1

3-level ordinal outcome measured at 3 time points (repeated measures)

12 May 2020, 09:46

Hello Statalisters. I have been approached for advice about how to analyze a 3-level outcome variable that has been measured on 3 different occasions. The subjects are surgical patients, and the occasions are pre-op and 3 and 12 months post-op. I gather there are actually multiple outcomes, but so far, I have seen data for only one of them:

Code:

* Example generated by -dataex-. To install: ssc install dataex clear input float id byte(time y) 1 0 2 1 1 2 1 2 2 2 0 1 2 1 2 2 2 2 3 0 2 3 1 2 3 2 2 4 0 2 4 1 2 4 2 2 5 0 1 5 1 2 5 2 2 6 0 1 6 1 2 6 2 2 7 0 1 7 1 2 7 2 2 8 0 0 8 1 2 8 2 2 9 0 1 9 1 2 9 2 2 10 0 0 10 1 2 10 2 2 11 0 2 11 1 2 11 2 2 12 0 0 12 1 2 12 2 1 13 0 1 13 1 2 13 2 2 14 0 1 14 1 2 14 2 2 15 0 2 15 1 2 15 2 2 16 0 1 16 1 2 16 2 2 17 0 2 17 1 2 17 2 2 end

The researchers wish to test the null hypothesis that the proportion of patients in each category does not change over time.

My first thought was that if there were only two time points, one could use McNemar's Chi-square, and I wondered if there was a 3D extension of McNemar's test. But as I write this, I suppose McNemar's test, or an extension thereof, would ignore the ordinal nature of the variables, making it less than ideal.

My next thought was that one could use a form of ordinal logistic regression that can accommodate the dependencies in the repeated measures. Perhaps one of the following, for example:

Code:

ologit y i.time, vce(cluster id) meologit y i.time || id:

But as you'll see if you try these commands using the data I posted above, there are problems.

The -ologit- command runs, but this warning is included below the table of coefficients:
Note: 17 observations completely determined. Standard errors questionable.
The -meologit- command iterates 11 times before displaying "Refining starting values", and then appears as if it will iterate forever (with a not concave warning for each iteration after the first few).

I assume some of this has to do with the fact that virtually all patients are in the 3rd category after the first time point.

Code:

. tabulate y time | time y | 0 1 2 | Total -----------+---------------------------------+---------- 0 | 3 0 0 | 3 1 | 8 0 1 | 9 2 | 6 17 16 | 39 -----------+---------------------------------+---------- Total | 17 17 17 | 51

As I've said, there are some other outcome variables I've not seen yet, so I don't know if this same pattern will occur for them, but it would not surprise me if it did.

I welcome any thoughts you might have about how to proceed.

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Tags: categorical, ordinal, repeated measures

1 like
Joseph Coveney

Join Date: Apr 2014

Posts: 4422
#2

12 May 2020, 16:51

It's not particularly powerful, but if nothing else pans out, then how about Friedman's test?

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿquietlyÿinputÿbyteÿ(idÿtimeÿy)

.ÿ
.ÿemhÿyÿtime,ÿstrata(id)ÿanovaÿtransformation(rank)

ExtendedÿMantel-Haenszelÿ(Cochran-Mantel-Haenszel)ÿStratifiedÿTestÿofÿAssociation

ANOVAÿ(RowÿMeanÿScores)ÿStatistic:
Qÿ(2)ÿ=ÿ21.4118,ÿPÿ=ÿ0.0000ÿ
Transformation:ÿRanks

.ÿ
.ÿ/*ÿquietlyÿreshapeÿwideÿy,ÿi(id)ÿj(time)
>ÿsigntestÿy0ÿ=ÿy1
>ÿsigntestÿy0ÿ=ÿy2ÿ*/
.ÿ
.ÿexit

endÿofÿdo-file

.
Comment

Bruce Weaver

Join Date: May 2014
Posts: 1134

13 May 2020, 10:09

Interesting suggestion, Joseph. Thanks. It occurs to me that where you suggested -signtest-, I could use -symmetry- to get pair-wise tests of marginal homogeneity.

Code:

symmetry y0 y1
symmetry y0 y2
symmetry y1 y2

Doing that for the data I have been given (thus far), I get these results:

Code:

. symmetry y0 y1

--------------------------------------
          |            1 y            
      0 y |   0      1      2    Total
----------+---------------------------
        0 |    0      0      3      3
        1 |    0      0      8      8
        2 |    0      0      6      6
          |
    Total |    0      0     17     17
--------------------------------------

                                             chi2     df      Prob>chi2
------------------------------------------------------------------------
Symmetry (asymptotic)                 |     11.00      2         0.0041
Marginal homogeneity (Stuart-Maxwell) |     11.00      2         0.0041
------------------------------------------------------------------------

. symmetry y0 y2

--------------------------------------
          |            2 y            
      0 y |   0      1      2    Total
----------+---------------------------
        0 |    0      1      2      3
        1 |    0      0      8      8
        2 |    0      0      6      6
          |
    Total |    0      1     16     17
--------------------------------------

                                             chi2     df      Prob>chi2
------------------------------------------------------------------------
Symmetry (asymptotic)                 |     11.00      3         0.0117
Marginal homogeneity (Stuart-Maxwell) |     10.38      2         0.0056
------------------------------------------------------------------------

. symmetry y1 y2

-------------------------------
          |         2 y        
      1 y |   1      2    Total
----------+--------------------
        1 |    0      0      0
        2 |    1     16     17
          |
    Total |    1     16     17
-------------------------------

                                             chi2     df      Prob>chi2
------------------------------------------------------------------------
Symmetry (asymptotic)                 |      1.00      1         0.3173
Marginal homogeneity (Stuart-Maxwell) |      1.00      1         0.3173
------------------------------------------------------------------------

Those results match very nicely with the calibrated eye-ball test, I think. The proportions at time 1 and time 2 look different than those at time 0; and there is no big difference between times 1 and 2.

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)

Comment

Mike Lacy

Join Date: Apr 2014

Posts: 2417
#4

13 May 2020, 11:34

Bruce, what about defining a variable that reflects the sequential pattern for a subject, and analyzing that?

Code:

reshape wide y, i(id) j(time) egen pattern = concat(y*) tab pattern

While I can appreciate that considering a potential list of 81 patterns might be cumbersome, I would think that in a clinical context, there might be some natural reduced categorization of these patterns, and that level of detail might be interesting.

For example, for the case with two time periods, I have treated data like this as a 1=getting worse 2= staying same 3 = getting better, and then analyzed that ordinal outcome. To me, by not discarding the ties (per some kind of quasi-McNemar approach), one keeps more interesting information.

Also, what about sequence analysis (about which I know nothing other than thinking the idea is cool): -ssc describe sq-
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1134
#5

13 May 2020, 12:41

Hi Mike. That's another interesting suggestion. For the data I've been given, there are 4 patterns:

Code:

. tab pattern pattern | Freq. Percent Cum. ------------+----------------------------------- 021 | 1 5.88 5.88 022 | 2 11.76 17.65 122 | 8 47.06 64.71 222 | 6 35.29 100.00 ------------+----------------------------------- Total | 17 100.00

It's not clear to me how I would analyze that. A Chi-square goodness of fit comes to mind (or an exact multinomial test, given the small n), but it's not clear (to me) what the expected frequencies should be. This is complicated by your observation that there are 27 possible sequences. (You said 81, but 3*3*3 = 27.)

Even if those problems could be solved, I'm not sure that analysis of the sequences exactly fits the research question. Here is an exact quote of what I received from one of the researchers:

I am trying to compare categorical data at 3 time points.

Specifically, looking at proportion of patients in each category of pain scores (3 categories: mild, moderate, severe), at pre-op, post op 6 months and post of 12 months. Null Hypothesis being there will be no change.

Obviously, one of the outcome variables (for which I have not yet seen any data) will be pain. (The data I shared earlier were for range of motion.) I think the pair-wise tests of marginal homogeneity are probably addressing that research question pretty closely, and in a way that is fairly understandable for the researchers (and their eventual readers, should they get published). There is a multiplicity problem in doing all of those pair-wise tests. But if the researchers could identify one of those comparisons as primary (e.g., pre-op vs 12 months) and treat the other two as secondary, that would help.

I am reminded of the signature file of someone (name now forgotten) who used to post regularly to the sci.stat.* usenet newsgroups back in the day. It said something like:
All too often, analysis of data requires thought!
Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1134
#6

14 May 2020, 06:52

I just found an article by Alan Agresti (1983) that bears on this problem. I have not read it yet, but here it is for others who may be interested.
http://users.stat.ufl.edu/~aa/articles/agresti_1983.pdf

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment
Weiwen Ng

Join Date: Jun 2015

Posts: 1241
#7

14 May 2020, 07:24

Originally posted by Bruce Weaver View Post

...

I assume some of this has to do with the fact that virtually all patients are in the 3rd category after the first time point.

Code:

. tabulate y time | time y | 0 1 2 | Total -----------+---------------------------------+---------- 0 | 3 0 0 | 3 1 | 8 0 1 | 9 2 | 6 17 16 | 39 -----------+---------------------------------+---------- Total | 17 17 17 | 51

As I've said, there are some other outcome variables I've not seen yet, so I don't know if this same pattern will occur for them, but it would not surprise me if it did.

...

Never mind which statistical test you use, I think that this is fundamentally the issue with this particular outcome. You have 17 people. Basically everyone is in the highest outcome category at time 1 and time 2.

If you limit your meologit command to 25 iterations, you get this output (iteration log truncated):

convergence not achieved

Mixed-effects ologit regression Number of obs = 51
Group variable: id Number of groups = 17

Obs per group:
min = 3
avg = 3.0
max = 3

Integration method: ghermite Integration pts. = 7

Wald chi2(0) = .
Log likelihood = -20.933424 Prob > chi2 = .
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
time |
1 | 99.58246 . . . . .
2 | 30.2977 . . . . .
-------------+----------------------------------------------------------------
/cut1 | -21.83704 . . .
/cut2 | -5.031812 1.651368 -8.268434 -1.79519
-------------+----------------------------------------------------------------
id |
var(_cons)| 486.4787 . . .
------------------------------------------------------------------------------
Note: The above coefficient values are the result of non-adaptive quadrature
because the adaptive parameters could not be computed.
Warning: convergence not achieved

Basically, the parameter estimates for t = 1 and t = 2 are a bit non-sensical. Do you really think that in a large sample, the log-odds of a higher outcome for time 1 is that much higher than for time 2? Also, I'm not familiar with the scale for logit and ordered logit mixed models, but the variance of the random intercept looks like it is wandering to infinity. Anything with a missing parameter SE is probably causing trouble with convergence.

It could be that the cure is a larger sample, which is presumably impossible. Collapsing categories of y or categories of time might be the next thing to try, but I did and neither is satisfactory.

In principle, Bayesian priors would stabilize the parameter estimates. Using the default priors, I come up with this estimate:

Code:

bayes: meologit y i.time || id: Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done Simulation 10000 .........1000.........2000.........3000.........4000.........5000.........6000.........7 > 000.........8000.........9000.........10000 done Multilevel structure ------------------------------------------------------------------------------ id {U0}: random intercepts ------------------------------------------------------------------------------ Model summary ------------------------------------------------------------------------------ Likelihood: y ~ meologit(xb_y,{cut1} {cut2}) Priors: {y:i.time} ~ normal(0,10000) (1) {U0} ~ normal(0,{U0:sigma2}) (1) {cut1 cut2} ~ 1 (flat) Hyperprior: {U0:sigma2} ~ igamma(.01,.01) ------------------------------------------------------------------------------ (1) Parameters are elements of the linear form xb_y. Bayesian multilevel ologit regression MCMC iterations = 12,500 Random-walk Metropolis-Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Group variable: id Number of groups = 17 Obs per group: min = 3 avg = 3.0 max = 3 Family : ordinal Number of obs = 51 Link : logit Acceptance rate = .4566 Efficiency: min = .001307 avg = .04392 Log marginal-likelihood max = .0739 ------------------------------------------------------------------------------ | Equal-tailed | Mean Std. Dev. MCSE Median [95% Cred. Interval] -------------+---------------------------------------------------------------- y | time | 1 | 84.13453 60.38645 2.59789 71.59126 7.595737 228.43 2 | 4.383141 1.523439 .081593 4.245801 1.987244 8.148178 -------------+---------------------------------------------------------------- cut1 | -1.981003 .7643575 .028117 -1.962368 -3.634399 -.6207731 cut2 | .7547874 .5847946 .024829 .7333943 -.3520434 1.959883 -------------+---------------------------------------------------------------- id | U0:sigma2 | .6394781 .4953645 .137007 .6529468 .0116704 1.698588 ------------------------------------------------------------------------------ Note: Default priors are used for model parameters. Note: There is a high autocorrelation after 500 lags.

The variance of the random intercept is more sensible, but you still have the basic problem that the log-odds of a higher response to y at t = 1 are a lot higher than at t = 2. Furthermore, you would still need to conduct the usual Bayesian diagnostics, as the last line of the output alludes. I didn't do this because I'm not familiar with Bayesian diagnostics and because Bayesian estimation doesn't save your bacon here, either - you either need a larger sample, or some justifiably informative priors. Or, possibly, the default vague priors just don't work in this case and there are better non-informative priors that one could have used.

Be aware that it can be very hard to answer a question without sample data. You can use the dataex command for this. Type help dataex at the command line.

When presenting code or results, please use the code delimiters format them. Use the # button on the formatting toolbar, between the " (double quote) and <> buttons.
Comment
Mike Lacy

Join Date: Apr 2014

Posts: 2417
#8

14 May 2020, 11:20

For me, approaching this depends on the alternative hypothesis of interest. If we want an alternative such as "y gets bigger over time" (vs. "no association between y and time), and presuming we want an analysis that appropriately takes account of the within-subject aspect of this study, I think there is an analysis using the -somersd- package. It will calculate Somers' D or a stratified version of it, and that stratified analysis can even use subject id to define the strata. So, with your example data above, here's something that might be relevant here:

Code:

// Ignoring the within-subject aspect of the design tab time y somersd time y /The convention of -somersd- is that the response variable is last in the list. // Example, using just one informative subject (i.e., one who has variation in y) tab time y if id == 2 somersd time y if id == 2 // All subjects, stratifying on id somersd time y, wstrata(id)

I believe the stratified estimate is something akin to a variance-weighted mean of the estimates from each stratum.

Roger Newson , the author of this package, documents and explains the Somers' D statistic nicely in his article related to this package, and shows that this 1962 measure of association (lost to history, for the most part) is a more fundamental and elegant statistic than is generally known. He's a sometime participant here, so perhaps he will comment on whether my suggestion is reasonable.

This analysis is likely in the spirit of some of the stuff in the Agresti article, if I recall it correctly.
Comment
Bruce Weaver

Join Date: May 2014

Posts: 1134
#9

14 May 2020, 13:31

Thanks Weiwen & Mike. I'll keep an eye out for connections to Somers' D when I eventually find time to read that Agresti article. And looking forward to Roger's comments, should he jump in.

Cheers,
Bruce

--
Bruce Weaver
Email: [email protected]
Version: Stata/MP 19.5 (Windows)
Comment

Announcement