Trend analysis

Laura Myles

Join Date: Jun 2018

Posts: 153
#1

Trend analysis

15 May 2019, 04:12

Dear Listers,

I plan to run a trend analysis to explore how original wellbeing score (bad vs. good vs. very good) is associated with recovery (yes/no) after surgery. The data were collected in 2 hospitals (and randomisation was conducted separately in each hospital).

I was originally planning to use mhodds - mhodds recovery wellbeing, by(hospital)

I can't find anywhere in the documentation whether there is a minimum of clusters but 2 seems a rather small number.

I am now considering running a trend analysis using logistic regression assuming linearity

logistic recovery i.wellbeing i.hostpital
contrast p.wellbeing

Any suggestion or resource would be greatly appreciated.

Thanks,
Laura
Tags: None

Joseph Coveney

Join Date: Apr 2014
Posts: 4421

15 May 2019, 19:55

To what were the patients randomized?

Anyway, they both seem to control the Type I error rate fairly well with two levels of the stratification factor (hospital). And they both agree pretty well, too, at least when the latter converges. Run the code below as a do-file to see for yourself.

Code:

version 15.1

clear *

set seed `=strreverse("1498410")'

program define simem, rclass
    version 15.1
    syntax

    drop _all
    quietly set obs 2
    generate byte hid = _n
    generate double hid_u = rnormal() //

    quietly expand 90
    bysort hid: generate byte wbs = runiformint(1, 3) // wellbeing score
    generate byte rcy = rbinomial(1, normal(hid_u)) // recovery

    mhodds rcy wbs, by(hid)
    tempname mhodds
    scalar define `mhodds' = r(p)

    logit rcy i.wbs i.hid
    contrast p.wbs
    tempname P
    matrix define `P' = r(p)
    return scalar logit = `P'[1, 1]
    return scalar mhodds = `mhodds'
end

simulate mhodds = r(mhodds) logit = r(logit), reps(1000) nodots: simem

foreach var of varlist _all {
    generate byte pos_`var' = `var' < 0.05
}

summarize pos*

list if mi(mhodds, logit) | pos_mhodds != pos_logit, noobs separator(0) abbreviate(20)

exit

I'm guessing that they'd have about the same power in detecting departures from the no-trend-at-all null hypothesis. It would be an easy modification of the code above to look into that, if you're interested.

You seem to be interested in linear trend, but more generally "association" could include the quadratic term, as well, with three levels of the wellbeing score.

Comment

Laura Myles

Join Date: Jun 2018

Posts: 153
#3

16 May 2019, 04:22

Hi Joseph,

thank you for your answer. This is really helpful! I have some follow-up questions I am hoping you can help me with:

How would I calculate the power in the code above? Please refer me to external links if easier.

The -mhodds- command estimates an overall OR in the output labelled MH estimate controlling for the clusters. I have been looking into ways to estimate an overall OR across clusters after running the logistic regression. I tried using the -contrast- command and requesting the effects (contrast p.wellbeing, effe); however, this does not seem to produce equivalent results.

Considering the fact that, using your simulation command, we find that logistic does not always converge, I am considering -mhodds- is a more reliable approach so considering the simulations results which show -mhodds- and -logistic- agree, I am inclined to use -mhodds-. Would you think it is sensible?

Thanks again,
Laura
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4421
#4

16 May 2019, 06:21

For power, you'll need to decide the effect that you'll want to avoid missing. (It's expressed best by Stephen Senn as interpretation 4 in a guest post at this blogsite.) Then you specify that in terms of a linear prediction (log odds ratio) that you would add to the random effect of hospital in the random binomial variate generation step. The basic technique is illustrated below with an arbitrarily chosen slope.

.ÿ
.ÿversionÿ15.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1498579")'

.ÿquietlyÿsetÿobsÿ2

.ÿgenerateÿbyteÿhidÿ=ÿ_n

.ÿgenerateÿdoubleÿhid_uÿ=ÿrnormal()

.ÿ
.ÿquietlyÿexpandÿ30

.ÿgenerateÿbyteÿscoÿ=ÿruniformint(1,ÿ3)

.ÿ
.ÿgenerateÿdoubleÿxbÿ=ÿhid_uÿ+ÿ(scoÿ-ÿ1.5)ÿ/ÿ3

.ÿgenerateÿbyteÿoutÿ=ÿrbinomial(1,ÿinvlogit(xb))

.ÿ
.ÿlogisticÿoutÿc.scoÿi.hid,ÿnolog

LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿ60
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(2)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ9.70
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0078
Logÿlikelihoodÿ=ÿ-29.944814ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.1394

------------------------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿoutÿ|ÿOddsÿRatioÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
-------------+----------------------------------------------------------------
ÿÿÿÿÿÿÿÿÿscoÿ|ÿÿÿ1.409998ÿÿÿÿ.599525ÿÿÿÿÿ0.81ÿÿÿ0.419ÿÿÿÿÿ.6127611ÿÿÿÿ3.244484
ÿÿÿÿÿÿÿ2.hidÿ|ÿÿÿ7.467537ÿÿÿ5.423379ÿÿÿÿÿ2.77ÿÿÿ0.006ÿÿÿÿÿ1.798801ÿÿÿÿ31.00071
ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0506929ÿÿÿ.0596388ÿÿÿÿ-2.53ÿÿÿ0.011ÿÿÿÿÿ.0050528ÿÿÿÿ.5085833
------------------------------------------------------------------------------
Note:ÿ_consÿestimatesÿbaselineÿodds.

.ÿ
.ÿmhoddsÿoutÿsco,ÿby(hid)

Scoreÿtestÿforÿtrendÿofÿoddsÿwithÿsco
byÿhid

(TheÿOddsÿRatioÿestimateÿisÿanÿapproximationÿtoÿtheÿoddsÿratioÿ
forÿaÿoneÿunitÿincreaseÿinÿsco)

-------------------------------------------------------------------------------
ÿÿÿÿÿÿhidÿ|ÿOddsÿRatioÿÿÿÿÿÿÿÿchi2(1)ÿÿÿÿÿÿÿÿÿP>chi2ÿÿÿÿÿÿÿ[95%ÿConf.ÿInterval]
----------+--------------------------------------------------------------------
ÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿ1.256747ÿÿÿÿÿÿÿÿÿÿÿ0.09ÿÿÿÿÿÿÿÿÿ0.7624ÿÿÿÿÿÿÿÿÿ0.28567ÿÿÿÿ5.52882
ÿÿÿÿÿÿÿÿ2ÿ|ÿÿÿ1.438358ÿÿÿÿÿÿÿÿÿÿÿ0.57ÿÿÿÿÿÿÿÿÿ0.4505ÿÿÿÿÿÿÿÿÿ0.55957ÿÿÿÿ3.69726
-------------------------------------------------------------------------------

ÿÿÿÿMantel-Haenszelÿestimateÿcontrollingÿforÿhid
ÿÿÿÿ----------------------------------------------------------------
ÿÿÿÿÿOddsÿRatioÿÿÿÿchi2(1)ÿÿÿÿÿÿÿÿP>chi2ÿÿÿÿÿÿÿÿ[95%ÿConf.ÿInterval]
ÿÿÿÿ----------------------------------------------------------------
ÿÿÿÿÿÿÿ1.383364ÿÿÿÿÿÿÿ0.64ÿÿÿÿÿÿÿÿ0.4244ÿÿÿÿÿÿÿÿÿ0.623975ÿÿÿ3.066943
ÿÿÿÿ----------------------------------------------------------------

TestÿofÿhomogeneityÿofÿORsÿ(approx):ÿchi2(1)ÿÿÿ=ÿÿÿÿ0.02
ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPr>chi2ÿÿÿ=ÿÿ0.8803

.ÿ
.ÿexit

endÿofÿdo-file

.

About your concern that the two methods do not seem to produce equivalent results: as its helpfile says, mhodds computes an approximation to the maximum likelihood (logistic regression) estimate. Also, with more than two categories in the explanatory variable, it treats it as continuous—this is analogous to the c. factor variable notation in a regression context (for illustration of this, see what I've used above). In the example above, mhodds gets an (approximate) odds ratio of 1.4 with a chi-square test statistic of 0.64 and a p-value of 0.42, while logistic regression gets a maximum likelihood estimate of 1.4, and a chi-square test statistic of 0.81² = 0.66 and a p-value of 0.42.

If your sample size is reasonably large and suitable for maximum likelihood methods, I would probably not worry too much about convergence with logit.

Last edited by Joseph Coveney; 16 May 2019, 06:25.
Comment
Laura Myles

Join Date: Jun 2018

Posts: 153
#5

16 May 2019, 09:06

Thanks again for such detailed response. After this discussion, I would just like to check my planned analyses:

If I decided to run a logistic regression, I should first test whether the ordinal predictor should be modelled as continuous or categorical using a likelihood ratio test.

If the test showed the model with the predictor treated as continuous is sufficient, I could use the overall OR for wellbeing as in your latest message, which is also equivalent to -mhodds-.

However, if the test were significant, I would need to model the variable as categorical. In this case, I can run the analysis as:

logistic rcy i.wbs i.hid
contrast p.wbs

the latter line produces a test of trend chi-square akin to the one observed when entering the predictor as continuous. Looking at the forum and other resources, I found it is advised to use the sheafcoef command to get an overall estimate of wellbeing - although it is now a latent construct.

I am unsure how to best interpret the results from the sheafcoef - based on the help file, the OR for the latent variable is always positive unless I set a variable to be a key variable but I am unsure about the best approach.

The results are quite different I find.

. clear *

.
. set seed `=strreverse("1498579")'

.
. qui set obs 2

.
. g byte hid = _n

.
. g double hid_u = rnormal()

.
. quietly expand 90

. bysort hid: generate byte wbs = runiformint(1, 3) // wellbeing score

. generate byte rcy = rbinomial(1, normal(hid_u)) // recovery

*** Predictor as continuous

. logistic rcy c.wbs i.hid

Logistic regression Number of obs = 180
LR chi2(2) = 18.98
Prob > chi2 = 0.0001
Log likelihood = -49.027219 Pseudo R2 = 0.1621

------------------------------------------------------------------------------
rcy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
wbs | .8755642 .288746 -0.40 0.687 .4587504 1.671089
2.hid | 20.13543 21.00681 2.88 0.004 2.605693 155.5961
_cons | .0148139 .0179193 -3.48 0.000 .0013837 .1586008
------------------------------------------------------------------------------
Note: _cons estimates baseline odds.

*** Predictor as categorical -

tab wbs, g(ws)
qui: logistic rcy ws2 ws3 hid

. sheafcoef, latent(ws: ws2 ws3) eform
------------------------------------------------------------------------------
rcy | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
main |
ws_e | 1.171548 .3402035 3.44 0.001 .5047609 1.838334
hid_e | 20.08222 20.95061 0.96 0.338 -20.98022 61.14465
_cons_e | .0006096 .001266 0.48 0.630 -.0018717 .003091
-------------+----------------------------------------------------------------
on_ws |
ws2 | .3537425 3.722849 0.10 0.924 -6.942907 7.650392
ws3 | -1.917714 2.4154 -0.79 0.427 -6.65181 2.816382
------------------------------------------------------------------------------
(_e) indicates the variables whose coefficients have been exponentiated
Comment

Joseph Coveney

Join Date: Apr 2014
Posts: 4421

16 May 2019, 20:42

I don't know that much about the user-written sheafcoef, and so I can't advise you there, but it kind of strikes me as overkill, as if you're drifting away from the objective you gave in your first post ("to explore how original wellbeing score . . . is associated with recovery . . . after surgery").

Could you explore the relationship between the two variables graphically, perhaps, something along the lines of a dot chart?

Code:

version 15.1

clear *
set more on

set seed `=strreverse("1498631")'

quietly set obs 2
generate byte hid = _n
label define Hospitals 1 `" "City" "General" "' 2 `" "University" "Medical" "Center" "'
label values hid Hospitals
label variable hid "Hospital ID"
generate double hid_u = rnormal()

quietly expand 90

generate byte wbs = runiformint(1, 3)
label define WellbeingScores 1 Bad 2 Good 3 "Very Good"
label values wbs WellbeingScores

generate double xb = hid_u + (wbs - 1.5)

generate byte rcy = rbinomial(1, invlogit(xb))

// Maybe
graph dot (mean) rcy, over(wbs) over(hid) ///
    ylabel( , format(%02.1f)) ytitle(Recovery after Surgery (Proportion))

more

// or maybe
graph dot (mean) rcy, over(hid) asyvars over(wbs) ///
    ylabel( , format(%02.1f)) ytitle(Recovery after Surgery (Proportion)) ///
        legend(off)

exit

Comment

Laura Myles

Join Date: Jun 2018

Posts: 153
#7

17 May 2019, 09:49

Thanks again for your reply. I agree that a graphical representation may be best as -sheafcoef- creates another level of complexity by relying on latent variables.
Comment

Announcement

Comment

Comment

Comment

Comment

Comment

Comment