Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trend analysis

    Dear Listers,

    I plan to run a trend analysis to explore how original wellbeing score (bad vs. good vs. very good) is associated with recovery (yes/no) after surgery. The data were collected in 2 hospitals (and randomisation was conducted separately in each hospital).

    I was originally planning to use mhodds - mhodds recovery wellbeing, by(hospital)

    I can't find anywhere in the documentation whether there is a minimum of clusters but 2 seems a rather small number.

    I am now considering running a trend analysis using logistic regression assuming linearity

    logistic recovery i.wellbeing i.hostpital
    contrast p.wellbeing

    Any suggestion or resource would be greatly appreciated.

    Thanks,
    Laura


  • #2
    To what were the patients randomized?

    Anyway, they both seem to control the Type I error rate fairly well with two levels of the stratification factor (hospital). And they both agree pretty well, too, at least when the latter converges. Run the code below as a do-file to see for yourself.
    Code:
    version 15.1
    
    clear *
    
    set seed `=strreverse("1498410")'
    
    program define simem, rclass
        version 15.1
        syntax
    
        drop _all
        quietly set obs 2
        generate byte hid = _n
        generate double hid_u = rnormal() //
    
        quietly expand 90
        bysort hid: generate byte wbs = runiformint(1, 3) // wellbeing score
        generate byte rcy = rbinomial(1, normal(hid_u)) // recovery
    
        mhodds rcy wbs, by(hid)
        tempname mhodds
        scalar define `mhodds' = r(p)
    
        logit rcy i.wbs i.hid
        contrast p.wbs
        tempname P
        matrix define `P' = r(p)
        return scalar logit = `P'[1, 1]
        return scalar mhodds = `mhodds'
    end
    
    simulate mhodds = r(mhodds) logit = r(logit), reps(1000) nodots: simem
    
    foreach var of varlist _all {
        generate byte pos_`var' = `var' < 0.05
    }
    
    summarize pos*
    
    list if mi(mhodds, logit) | pos_mhodds != pos_logit, noobs separator(0) abbreviate(20)
    
    exit
    I'm guessing that they'd have about the same power in detecting departures from the no-trend-at-all null hypothesis. It would be an easy modification of the code above to look into that, if you're interested.

    You seem to be interested in linear trend, but more generally "association" could include the quadratic term, as well, with three levels of the wellbeing score.

    Comment


    • #3
      Hi Joseph,

      thank you for your answer. This is really helpful! I have some follow-up questions I am hoping you can help me with:

      How would I calculate the power in the code above? Please refer me to external links if easier.

      The -mhodds- command estimates an overall OR in the output labelled MH estimate controlling for the clusters. I have been looking into ways to estimate an overall OR across clusters after running the logistic regression. I tried using the -contrast- command and requesting the effects (contrast p.wellbeing, effe); however, this does not seem to produce equivalent results.

      Considering the fact that, using your simulation command, we find that logistic does not always converge, I am considering -mhodds- is a more reliable approach so considering the simulations results which show -mhodds- and -logistic- agree, I am inclined to use -mhodds-. Would you think it is sensible?

      Thanks again,
      Laura

      Comment


      • #4
        For power, you'll need to decide the effect that you'll want to avoid missing. (It's expressed best by Stephen Senn as interpretation 4 in a guest post at this blogsite.) Then you specify that in terms of a linear prediction (log odds ratio) that you would add to the random effect of hospital in the random binomial variate generation step. The basic technique is illustrated below with an arbitrarily chosen slope.

        .ÿ
        .ÿversionÿ15.1

        .ÿ
        .ÿclearÿ*

        .ÿ
        .ÿsetÿseedÿ`=strreverse("1498579")'

        .ÿquietlyÿsetÿobsÿ2

        .ÿgenerateÿbyteÿhidÿ=ÿ_n

        .ÿgenerateÿdoubleÿhid_uÿ=ÿrnormal()

        .ÿ
        .ÿquietlyÿexpandÿ30

        .ÿgenerateÿbyteÿscoÿ=ÿruniformint(1,ÿ3)

        .ÿ
        .ÿgenerateÿdoubleÿxbÿ=ÿhid_uÿ+ÿ(scoÿ-ÿ1.5)ÿ/ÿ3

        .ÿgenerateÿbyteÿoutÿ=ÿrbinomial(1,ÿinvlogit(xb))

        .ÿ
        .ÿlogisticÿoutÿc.scoÿi.hid,ÿnolog

        LogisticÿregressionÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿNumberÿofÿobsÿÿÿÿÿ=ÿÿÿÿÿÿÿÿÿ60
        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿLRÿchi2(2)ÿÿÿÿÿÿÿÿ=ÿÿÿÿÿÿÿ9.70
        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿProbÿ>ÿchi2ÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.0078
        Logÿlikelihoodÿ=ÿ-29.944814ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPseudoÿR2ÿÿÿÿÿÿÿÿÿ=ÿÿÿÿÿ0.1394

        ------------------------------------------------------------------------------
        ÿÿÿÿÿÿÿÿÿoutÿ|ÿOddsÿRatioÿÿÿStd.ÿErr.ÿÿÿÿÿÿzÿÿÿÿP>|z|ÿÿÿÿÿ[95%ÿConf.ÿInterval]
        -------------+----------------------------------------------------------------
        ÿÿÿÿÿÿÿÿÿscoÿ|ÿÿÿ1.409998ÿÿÿÿ.599525ÿÿÿÿÿ0.81ÿÿÿ0.419ÿÿÿÿÿ.6127611ÿÿÿÿ3.244484
        ÿÿÿÿÿÿÿ2.hidÿ|ÿÿÿ7.467537ÿÿÿ5.423379ÿÿÿÿÿ2.77ÿÿÿ0.006ÿÿÿÿÿ1.798801ÿÿÿÿ31.00071
        ÿÿÿÿÿÿÿ_consÿ|ÿÿÿ.0506929ÿÿÿ.0596388ÿÿÿÿ-2.53ÿÿÿ0.011ÿÿÿÿÿ.0050528ÿÿÿÿ.5085833
        ------------------------------------------------------------------------------
        Note:ÿ_consÿestimatesÿbaselineÿodds.

        .ÿ
        .ÿmhoddsÿoutÿsco,ÿby(hid)

        Scoreÿtestÿforÿtrendÿofÿoddsÿwithÿsco
        byÿhid

        (TheÿOddsÿRatioÿestimateÿisÿanÿapproximationÿtoÿtheÿoddsÿratioÿ
        forÿaÿoneÿunitÿincreaseÿinÿsco)

        -------------------------------------------------------------------------------
        ÿÿÿÿÿÿhidÿ|ÿOddsÿRatioÿÿÿÿÿÿÿÿchi2(1)ÿÿÿÿÿÿÿÿÿP>chi2ÿÿÿÿÿÿÿ[95%ÿConf.ÿInterval]
        ----------+--------------------------------------------------------------------
        ÿÿÿÿÿÿÿÿ1ÿ|ÿÿÿ1.256747ÿÿÿÿÿÿÿÿÿÿÿ0.09ÿÿÿÿÿÿÿÿÿ0.7624ÿÿÿÿÿÿÿÿÿ0.28567ÿÿÿÿ5.52882
        ÿÿÿÿÿÿÿÿ2ÿ|ÿÿÿ1.438358ÿÿÿÿÿÿÿÿÿÿÿ0.57ÿÿÿÿÿÿÿÿÿ0.4505ÿÿÿÿÿÿÿÿÿ0.55957ÿÿÿÿ3.69726
        -------------------------------------------------------------------------------

        ÿÿÿÿMantel-Haenszelÿestimateÿcontrollingÿforÿhid
        ÿÿÿÿ----------------------------------------------------------------
        ÿÿÿÿÿOddsÿRatioÿÿÿÿchi2(1)ÿÿÿÿÿÿÿÿP>chi2ÿÿÿÿÿÿÿÿ[95%ÿConf.ÿInterval]
        ÿÿÿÿ----------------------------------------------------------------
        ÿÿÿÿÿÿÿ1.383364ÿÿÿÿÿÿÿ0.64ÿÿÿÿÿÿÿÿ0.4244ÿÿÿÿÿÿÿÿÿ0.623975ÿÿÿ3.066943
        ÿÿÿÿ----------------------------------------------------------------

        TestÿofÿhomogeneityÿofÿORsÿ(approx):ÿchi2(1)ÿÿÿ=ÿÿÿÿ0.02
        ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿPr>chi2ÿÿÿ=ÿÿ0.8803

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .


        About your concern that the two methods do not seem to produce equivalent results: as its helpfile says, mhodds computes an approximation to the maximum likelihood (logistic regression) estimate. Also, with more than two categories in the explanatory variable, it treats it as continuous—this is analogous to the c. factor variable notation in a regression context (for illustration of this, see what I've used above). In the example above, mhodds gets an (approximate) odds ratio of 1.4 with a chi-square test statistic of 0.64 and a p-value of 0.42, while logistic regression gets a maximum likelihood estimate of 1.4, and a chi-square test statistic of 0.812 = 0.66 and a p-value of 0.42.

        If your sample size is reasonably large and suitable for maximum likelihood methods, I would probably not worry too much about convergence with logit.
        Last edited by Joseph Coveney; 16 May 2019, 06:25.

        Comment


        • #5
          Thanks again for such detailed response. After this discussion, I would just like to check my planned analyses:

          If I decided to run a logistic regression, I should first test whether the ordinal predictor should be modelled as continuous or categorical using a likelihood ratio test.

          If the test showed the model with the predictor treated as continuous is sufficient, I could use the overall OR for wellbeing as in your latest message, which is also equivalent to -mhodds-.

          However, if the test were significant, I would need to model the variable as categorical. In this case, I can run the analysis as:

          logistic rcy i.wbs i.hid
          contrast p.wbs

          the latter line produces a test of trend chi-square akin to the one observed when entering the predictor as continuous. Looking at the forum and other resources, I found it is advised to use the sheafcoef command to get an overall estimate of wellbeing - although it is now a latent construct.

          I am unsure how to best interpret the results from the sheafcoef - based on the help file, the OR for the latent variable is always positive unless I set a variable to be a key variable but I am unsure about the best approach.


          The results are quite different I find.

          . clear *

          .
          . set seed `=strreverse("1498579")'

          .
          . qui set obs 2

          .
          . g byte hid = _n

          .
          . g double hid_u = rnormal()

          .
          . quietly expand 90

          . bysort hid: generate byte wbs = runiformint(1, 3) // wellbeing score

          . generate byte rcy = rbinomial(1, normal(hid_u)) // recovery

          *** Predictor as continuous

          . logistic rcy c.wbs i.hid

          Logistic regression Number of obs = 180
          LR chi2(2) = 18.98
          Prob > chi2 = 0.0001
          Log likelihood = -49.027219 Pseudo R2 = 0.1621

          ------------------------------------------------------------------------------
          rcy | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          wbs | .8755642 .288746 -0.40 0.687 .4587504 1.671089
          2.hid | 20.13543 21.00681 2.88 0.004 2.605693 155.5961
          _cons | .0148139 .0179193 -3.48 0.000 .0013837 .1586008
          ------------------------------------------------------------------------------
          Note: _cons estimates baseline odds.


          *** Predictor as categorical -

          tab wbs, g(ws)
          qui: logistic rcy ws2 ws3 hid

          . sheafcoef, latent(ws: ws2 ws3) eform
          ------------------------------------------------------------------------------
          rcy | Coef. Std. Err. z P>|z| [95% Conf. Interval]
          -------------+----------------------------------------------------------------
          main |
          ws_e | 1.171548 .3402035 3.44 0.001 .5047609 1.838334
          hid_e | 20.08222 20.95061 0.96 0.338 -20.98022 61.14465
          _cons_e | .0006096 .001266 0.48 0.630 -.0018717 .003091
          -------------+----------------------------------------------------------------
          on_ws |
          ws2 | .3537425 3.722849 0.10 0.924 -6.942907 7.650392
          ws3 | -1.917714 2.4154 -0.79 0.427 -6.65181 2.816382
          ------------------------------------------------------------------------------
          (_e) indicates the variables whose coefficients have been exponentiated

          Comment


          • #6
            I don't know that much about the user-written sheafcoef, and so I can't advise you there, but it kind of strikes me as overkill, as if you're drifting away from the objective you gave in your first post ("to explore how original wellbeing score . . . is associated with recovery . . . after surgery").

            Could you explore the relationship between the two variables graphically, perhaps, something along the lines of a dot chart?
            Code:
            version 15.1
            
            clear *
            set more on
            
            set seed `=strreverse("1498631")'
            
            quietly set obs 2
            generate byte hid = _n
            label define Hospitals 1 `" "City" "General" "' 2 `" "University" "Medical" "Center" "'
            label values hid Hospitals
            label variable hid "Hospital ID"
            generate double hid_u = rnormal()
            
            quietly expand 90
            
            generate byte wbs = runiformint(1, 3)
            label define WellbeingScores 1 Bad 2 Good 3 "Very Good"
            label values wbs WellbeingScores
            
            generate double xb = hid_u + (wbs - 1.5)
            
            generate byte rcy = rbinomial(1, invlogit(xb))
            
            // Maybe
            graph dot (mean) rcy, over(wbs) over(hid) ///
                ylabel( , format(%02.1f)) ytitle(Recovery after Surgery (Proportion))
            
            more
            
            // or maybe
            graph dot (mean) rcy, over(hid) asyvars over(wbs) ///
                ylabel( , format(%02.1f)) ytitle(Recovery after Surgery (Proportion)) ///
                    legend(off)
            
            exit

            Comment


            • #7
              Thanks again for your reply. I agree that a graphical representation may be best as -sheafcoef- creates another level of complexity by relying on latent variables.

              Comment

              Working...
              X