Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Clustered data and xtlogit and rare events

    Hi listers,

    I have data for a a small clustered randomised trial comparing two treatments. The outcome is binary (side effects: yes/no) and quite rare for one of the treatments (8% incidence vs. 17% for the other treatment) so much so that in some centres (clusters), none of the patients had side effects.

    I was planning to use -xtlogit- but it does not converge. I am not sure what test to use instead - should I opt for Poisson regression -xtpoisson- with robust SE assuming it is a good approximation? I am not sure if I can run a penalised maximum likelihood logistic regression using mixed models or is there any correction that can be made within the -xtlogit- options?

    xtset clusters
    xtlogit success i.intervention
    Last edited by Laura Myles; 22 Oct 2021, 10:49.

  • #2
    The modified Poisson regression approach as developed by Zou, and then Zou and Donner is appropriate to your use case of clustered binary data.

    One may code this as a Poisson model with robust (for non-clustered data) or cluster-robust ("sandwich") standard errors.

    Code:
    glm success i.intervention, vce(cluster clusterid) family(poisson) link(log) eform
    poisson success i.intervention, vce(cluster clusterid) irr   // equivalent model to above
    References:

    1) Zou, G. (2004). A Modified Poisson Regression Approach to Prospective Studies with Binary Data. American Journal of Epidemiology, 159(7), 702–706. https://doi.org/10.1093/aje/kwh090

    2) Zou, G. Y., & Donner, A. (2013). Extension of the modified Poisson regression model to prospective studies with correlated binary data. Statistical Methods in Medical Research, 22(6), 661–670. https://doi.org/10.1177/0962280211427759

    Comment


    • #3
      If it’s an experiment why not just use linear regression and cluster your standard errors? No need for logit. It’s saturated and so you’ll get the same average treatment effect.

      Comment


      • #4
        Originally posted by Laura Myles View Post
        I have data for a a small clustered randomised trial comparing two treatments. The outcome is . . . quite rare for one of the treatments . . . so much so that in some centres (clusters), none of the patients had side effects.
        I take it that you have a cluster-randomized trial (each clinic is assigned to a single treatment) and not a clustered, randomized trial (each clinic enrolls into both treatment groups).

        If you're interested in just testing for a difference in proportions between treatment groups of patients with one or more side effects, then maybe use the old arcsine-square-root transformation of the individual cluster proportions? It seems to have the greatest power to detect a difference of the magnitude (17% versus 8%) that you're seeing, i.e., a power of about 44%. And it maintains test size (5¼%) under the null of no difference. (See below for the simulation results.)

        Even conventional weighted least squares has somewhat greater power (23%) than cluster-robust standard errors (20%) and maintains test size a little better (5¾ versus 6+%). It also shares the advantage of the latter that there's no transformation, and so you can see the average treatment effect directly.

        In the simulation below, I assume that by small you mean something like, say, two-dozen clinics, and the number of patients in each clinic is on the order of what can give percentages in the 8 to 17 range with a rare outcome, -reg- is linear regression with clustered standard errors, -aov- is the classic ANOVA with arcsine-square-root-transformed proportions, and -wls- is linear regression of the proportions using cluster size as analytic weights; -zer- is the number of clinics out of 24 where none of the patients had one or more side effects.

        .ÿ
        .ÿversionÿ17.0

        .ÿ
        .ÿclearÿ*

        .ÿ
        .ÿsetÿseedÿ`=strreverse("1632898")'

        .ÿ
        .ÿprogramÿdefineÿsimem,ÿrclass
        ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ17.0
        ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ[,ÿExperimental(realÿ0.17)ÿControl(realÿ0.08)ÿn(integerÿ24)]
        ÿÿ3.ÿ
        .ÿÿÿÿÿÿÿÿÿdropÿ_all
        ÿÿ4.ÿÿÿÿÿÿÿÿÿquietlyÿsetÿobsÿ`n'
        ÿÿ5.ÿÿÿÿÿÿÿÿÿgenerateÿintÿcidÿ=ÿ_n
        ÿÿ6.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿcid_uÿ=ÿrnormal()
        ÿÿ7.ÿ
        .ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿtrtÿ=ÿmod(_n,ÿ2)
        ÿÿ8.ÿ
        .ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿsizÿ=ÿruniformint(2,ÿcond(trt,ÿ///
        >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿround(1/max(`experimental',ÿ`control')),ÿ///
        >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿround(1ÿ/ÿmin(`experimental',ÿ`control'))))
        ÿÿ9.ÿÿÿÿÿÿÿÿÿquietlyÿexpandÿsiz
        ÿ10.ÿÿÿÿÿÿÿÿÿbysortÿcid:ÿgenerateÿintÿpidÿ=ÿ_nÿ-ÿ1
        ÿ11.ÿ
        .ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿxbuÿ=ÿlogit(cond(trt,ÿ`experimental',ÿ`control'))ÿ+ÿcid_u
        ÿ12.ÿ
        .ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿaeÿ=ÿrbinomial(1,ÿinvlogit(xbu))
        ÿ13.ÿ
        .ÿÿÿÿÿÿÿÿÿregressÿaeÿi.trt,ÿvce(clusterÿcid)
        ÿ14.ÿÿÿÿÿÿÿÿÿtestÿ1.trt
        ÿ15.ÿÿÿÿÿÿÿÿÿtempnameÿreg
        ÿ16.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`reg'ÿ=ÿr(p)ÿ<ÿ0.05
        ÿ17.ÿ
        .ÿÿÿÿÿÿÿÿÿbyÿcid:ÿegenÿdoubleÿmeaÿ=ÿmean(ae)
        ÿ18.ÿÿÿÿÿÿÿÿÿcountÿifÿ!meaÿ&ÿ!pid
        ÿ19.ÿÿÿÿÿÿÿÿÿtempnameÿzer
        ÿ20.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`zer'ÿ=ÿr(N)
        ÿ21.ÿ
        .ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿasqÿ=ÿasin(sqrt(ÿ///
        >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcond(meaÿ==ÿ0,ÿ0.5ÿ/ÿsiz,ÿ///
        >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿcond(meaÿ==ÿ1,ÿ(sizÿ-ÿ0.5)ÿ/ÿsiz,ÿ///
        >ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿmea))))
        ÿ22.ÿÿÿÿÿÿÿÿÿanovaÿasqÿtrtÿifÿ!pid
        ÿ23.ÿÿÿÿÿÿÿÿÿtestÿ1.trt
        ÿ24.ÿÿÿÿÿÿÿÿÿtempnameÿaov
        ÿ25.ÿÿÿÿÿÿÿÿÿscalarÿdefineÿ`aov'ÿ=ÿr(p)ÿ<ÿ0.05
        ÿ26.ÿ
        .ÿÿÿÿÿÿÿÿÿ/*ÿquietlyÿtabulateÿcid,ÿgenerate(cid)
        >ÿÿÿÿÿÿÿÿÿexlogisticÿaeÿtrt,ÿcondvars(cid2-cid`n')ÿmidpÿnolog
        >ÿÿÿÿÿÿÿÿÿreturnÿscalarÿexlÿ=ÿe(p_probtest)[1,1]ÿ<ÿ0.05ÿ*/
        .ÿ
        .ÿÿÿÿÿÿÿÿÿregressÿmeaÿi.trtÿifÿ!pidÿ[aweight=siz]
        ÿ27.ÿÿÿÿÿÿÿÿÿtestÿ1.trt
        ÿ28.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿwlsÿ=ÿr(p)ÿ<ÿ0.05
        ÿ29.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿregÿ=ÿ`reg'
        ÿ30.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿaovÿ=ÿ`aov'
        ÿ31.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿzerÿ=ÿ`zer'
        ÿ32.ÿend

        .ÿ
        .ÿsimulateÿregÿ=ÿr(reg)ÿaovÿ=ÿr(aov)ÿwlsÿ=ÿr(wls)ÿzerÿ=ÿr(zer),ÿreps(10000)ÿnodots:ÿsimem

        ÿÿÿÿÿÿCommand:ÿsimem
        ÿÿÿÿÿÿÿÿÿÿreg:ÿr(reg)
        ÿÿÿÿÿÿÿÿÿÿaov:ÿr(aov)
        ÿÿÿÿÿÿÿÿÿÿwls:ÿr(wls)
        ÿÿÿÿÿÿÿÿÿÿzer:ÿr(zer)


        .ÿsummarize

        ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
        -------------+---------------------------------------------------------
        ÿÿÿÿÿÿÿÿÿregÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿÿ.204ÿÿÿÿ.4029891ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
        ÿÿÿÿÿÿÿÿÿaovÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.4435ÿÿÿÿ.4968223ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
        ÿÿÿÿÿÿÿÿÿwlsÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿÿ.229ÿÿÿÿ.4202103ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
        ÿÿÿÿÿÿÿÿÿzerÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿ12.2338ÿÿÿÿ2.482852ÿÿÿÿÿÿÿÿÿÿ4ÿÿÿÿÿÿÿÿÿ21

        .ÿ
        .ÿlocalÿnullÿ=ÿ(0.08ÿ+ÿ0.17)ÿ/ÿ2

        .ÿsimulateÿregÿ=ÿr(reg)ÿaovÿ=ÿr(aov)ÿwlsÿ=ÿr(wls)ÿzerÿ=ÿr(zer),ÿreps(10000)ÿnodots:ÿsimemÿ,ÿ///
        >ÿÿÿÿÿÿÿÿÿe(`null')ÿc(`null')

        ÿÿÿÿÿÿCommand:ÿsimem,ÿe(.125)ÿc(.125)
        ÿÿÿÿÿÿÿÿÿÿreg:ÿr(reg)
        ÿÿÿÿÿÿÿÿÿÿaov:ÿr(aov)
        ÿÿÿÿÿÿÿÿÿÿwls:ÿr(wls)
        ÿÿÿÿÿÿÿÿÿÿzer:ÿr(zer)


        .ÿsummarize

        ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
        -------------+---------------------------------------------------------
        ÿÿÿÿÿÿÿÿÿregÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.0617ÿÿÿÿ.2406219ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
        ÿÿÿÿÿÿÿÿÿaovÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.0524ÿÿÿÿ.2228435ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
        ÿÿÿÿÿÿÿÿÿwlsÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.0574ÿÿÿÿÿ.232617ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
        ÿÿÿÿÿÿÿÿÿzerÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿ12.3393ÿÿÿÿ2.482417ÿÿÿÿÿÿÿÿÿÿ4ÿÿÿÿÿÿÿÿÿ21

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .

        Comment


        • #5
          As an addendum, the simulation above assumes that the occurrence of side effects would discourage further enrollment by a clinic; if you were able to enforce uniform enrollment between clinics regardless of assigned treatment, then the advantages of one approach over the others nearly disappears: repeating the simulation above, but with a fixed enrollment of 18 patients in each clinic, the three methods are about the same, with cluster-robust linear regression having slightly higher rates of both true- and false-positive test findings.

          .ÿ
          .ÿsimulateÿregÿ=ÿr(reg)ÿaovÿ=ÿr(aov)ÿwlsÿ=ÿr(wls)ÿzerÿ=ÿr(zer),ÿreps(10000)ÿnodots:ÿsimem

          ÿÿÿÿÿÿCommand:ÿsimem
          ÿÿÿÿÿÿÿÿÿÿreg:ÿr(reg)
          ÿÿÿÿÿÿÿÿÿÿaov:ÿr(aov)
          ÿÿÿÿÿÿÿÿÿÿwls:ÿr(wls)
          ÿÿÿÿÿÿÿÿÿÿzer:ÿr(zer)


          .ÿsummarize

          ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
          -------------+---------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿregÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.3682ÿÿÿÿ.4823402ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
          ÿÿÿÿÿÿÿÿÿaovÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.3605ÿÿÿÿ.4801696ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
          ÿÿÿÿÿÿÿÿÿwlsÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.3477ÿÿÿÿÿ.476264ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
          ÿÿÿÿÿÿÿÿÿzerÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿ4.7654ÿÿÿÿ1.903136ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿ12

          .ÿ
          .ÿlocalÿnullÿ=ÿ(0.08ÿ+ÿ0.17)ÿ/ÿ2

          .ÿsimulateÿregÿ=ÿr(reg)ÿaovÿ=ÿr(aov)ÿwlsÿ=ÿr(wls)ÿzerÿ=ÿr(zer),ÿreps(10000)ÿnodots:ÿsimemÿ,ÿ///
          >ÿÿÿÿÿÿÿÿÿe(`null')ÿc(`null')

          ÿÿÿÿÿÿCommand:ÿsimem,ÿe(.125)ÿc(.125)
          ÿÿÿÿÿÿÿÿÿÿreg:ÿr(reg)
          ÿÿÿÿÿÿÿÿÿÿaov:ÿr(aov)
          ÿÿÿÿÿÿÿÿÿÿwls:ÿr(wls)
          ÿÿÿÿÿÿÿÿÿÿzer:ÿr(zer)


          .ÿsummarize

          ÿÿÿÿVariableÿ|ÿÿÿÿÿÿÿÿObsÿÿÿÿÿÿÿÿMeanÿÿÿÿStd.ÿdev.ÿÿÿÿÿÿÿMinÿÿÿÿÿÿÿÿMax
          -------------+---------------------------------------------------------
          ÿÿÿÿÿÿÿÿÿregÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿÿ.055ÿÿÿÿ.2279917ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
          ÿÿÿÿÿÿÿÿÿaovÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.0512ÿÿÿÿ.2204165ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
          ÿÿÿÿÿÿÿÿÿwlsÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿÿ.0495ÿÿÿÿ.2169204ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿÿ1
          ÿÿÿÿÿÿÿÿÿzerÿ|ÿÿÿÿÿ10,000ÿÿÿÿÿÿ4.1914ÿÿÿÿ1.855454ÿÿÿÿÿÿÿÿÿÿ0ÿÿÿÿÿÿÿÿÿ13

          .ÿ
          .ÿexit

          endÿofÿdo-file


          .


          But if the clinics' enrollment rates did tend to be affected in line with observed side effect rates, then the two alternatives might still be worth considering.

          Comment


          • #6
            Thanks all.

            Joseph Coveney You are correct, it is a cluster-randomized trial (each clinic is assigned to a single treatment).

            I should have made it clearer that as well as looking at adverse events, we are interested in the occurrence of one specific side effect; i.e. outcome occurrence and is binary coded as yes/no. It is possible that enrolment is affected by side effects

            What approach is suitable in this case? I would be inclined to use the Poisson approach suggested by Leonardo Guizzetti : are there any other considerations I should make?

            Comment


            • #7
              If it's truly a randomized trial -- even at the cluster level -- then, without covariates, you are comparing the proportions of ones between treated and control. If y is your outcome, this is done as

              Code:
              reg y treatment, vce(cluster clinicid)
              If you replace y with, say, logit, you'll get the same answer. And with Poisson, too. (With nonlinear models you'd use the margins command.) If you add covariates then they're can be differences, but you can still use a pooled method and cluster standard errors.

              Comment


              • #8
                Originally posted by Laura Myles View Post
                It is possible that enrolment is affected by side effects

                What approach is suitable in this case? I would be inclined to use the Poisson approach . . . are there any other considerations I should make?
                Well, if you have doubts, then run some simulations under the observed conditions for sample size (numbers of clinics assigned to each intervention) and enrollment patterns, and see which method has the best operating characteristics under null and alternative hypotheses. You can use the code above as a starting point to how that could be achieved. Based upon what obtained above, it seems as if a differential pattern of enrollment, possibly due to the outcome itself, would be an important consideration, if NHST is what you're interested in.

                Comment


                • #9
                  Thanks again for your replies.

                  Joseph Coveney thanks for sharing your code. I will familiarise myself with it as I am not too sure what it does.

                  Comment

                  Working...
                  X