Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sample Size in Count analysis

    Hello everybody and thank you in advance.

    I would like to ask you about a power analysis problem starting from an example.

    I have two medical devices (A and B) that finds lymphnodes.

    My experiment consists in using the two medical devices on the same subjects.

    The goal of the experiment is to demonstrate that device A finds 20% more lymphnodes than device B.

    From a previous study I know that device A on about 60 subjects (of the same population, obviously) identifies 85 lymph nodes.

    I wonder if it is possible to consider lymphnodes count as a Poisson (or negative binomial) variable, and to consider the single subject as a unit of time.
    In this way I would find myself constructing the sample size considering a Poisson regression coefficient of 1.2 as the effect size.

    What do you think? Could it be a correct strategy?

    Thanks again.
    Last edited by Gianfranco Di Gennaro; 02 Nov 2020, 09:57.

  • #2
    Gianfranco:
    welcome to this forum.
    See -help power-.
    Kind regards,
    Carlo
    (Stata 19.0)

    Comment


    • #3
      Thank you very much.
      Anyway, I’m not sure I will find examples like the one I described above, but I’ll try.
      Thanks again

      Comment


      • #4
        Originally posted by Gianfranco Di Gennaro View Post
        I wonder if it is possible to consider lymphnodes count as a Poisson (or negative binomial) variable, and to consider the single subject as a unit of time.
        In this way I would find myself constructing the sample size considering a Poisson regression coefficient of 1.2 as the effect size.
        It's certainly possible, but wouldn't it be something to be guided by subject matter knowledge? There is a boatload of statistical models for count outcomes beyond what you've considered—see for example here and here. What does the literature have to say about the distribution of lymph nodes with a positive diagnostic finding in your population of patients?

        I'm not sure whether Stata's power suite of commands handles count response data, but you could use simulation for power analysis, and although I've forgotten its name I believe that there is a user-written command up on SSC to assist in this when you're considering generalized linear models.

        Not necessarily to vouch for it, but if what's below is any indication, then you're not going to like what you see with your assumed baseline incidence rate and with an IRR of 1.2 taken as the detected effect size.

        .ÿ
        .ÿversionÿ16.1

        .ÿ
        .ÿclearÿ*

        .ÿ
        .ÿsetÿseedÿ`=strreverse("1579985")'

        .ÿ
        .ÿprogramÿdefineÿsimem,ÿrclass
        ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ16.1
        ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿ[Delta(realÿ0.2)ÿn(integerÿ100)]
        ÿÿ3.ÿ
        .ÿÿÿÿÿÿÿÿÿdropÿ_all
        ÿÿ4.ÿÿÿÿÿÿÿÿÿquietlyÿsetÿobsÿ`=round(`n',ÿ2)'
        ÿÿ5.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿtrtÿ=ÿmod(_n,ÿ2)
        ÿÿ6.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿmuÿ=ÿ85ÿ/ÿ60ÿ*ÿ(ÿ1ÿ+ÿ`delta'ÿ*ÿtrt)
        ÿÿ7.ÿÿÿÿÿÿÿÿÿgenerateÿintÿlynÿ=ÿrpoisson(mu)
        ÿÿ8.ÿ
        .ÿÿÿÿÿÿÿÿÿpoissonÿlynÿi.trt,ÿirr
        ÿÿ9.ÿÿÿÿÿÿÿÿÿtempnameÿT
        ÿ10.ÿÿÿÿÿÿÿÿÿmatrixÿdefineÿ`T'ÿ=ÿr(table)
        ÿ11.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿpÿ=ÿ`T'[4,ÿ2]
        ÿ12.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿiÿ=ÿ`T'[1,ÿ2]
        ÿ13.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿbÿ=ÿ`T'[1,ÿ3]
        ÿ14.ÿend

        .ÿ
        .ÿforvaluesÿnÿ=ÿ650(50)950ÿ{
        ÿÿ2.ÿÿÿÿÿquietlyÿsimulateÿpÿ=ÿr(p)ÿiÿ=ÿr(i)ÿbÿ=ÿr(b),ÿreps(3000)ÿnodots:ÿsimemÿ,ÿn(`n')
        ÿÿ3.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ_newline(1)ÿ"Nÿ=ÿ"ÿasÿresultÿ`n'
        ÿÿ4.ÿ
        .ÿÿÿÿÿÿÿÿÿsummarizeÿb,ÿmeanonly
        ÿÿ5.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"BaselineÿIRÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
        ÿÿ6.ÿ
        .ÿÿÿÿÿÿÿÿÿsummarizeÿi,ÿmeanonly
        ÿÿ7.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"IRRÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
        ÿÿ8.ÿ
        .ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿposÿ=ÿpÿ<ÿ0.05
        ÿÿ9.ÿÿÿÿÿÿÿÿÿsummarizeÿpos,ÿmeanonly
        ÿ10.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Powerÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
        ÿ11.ÿ}

        Nÿ=ÿ650
        BaselineÿIRÿ=ÿ1.42
        IRRÿ=ÿ1.20
        Powerÿ=ÿ0.82

        Nÿ=ÿ700
        BaselineÿIRÿ=ÿ1.41
        IRRÿ=ÿ1.20
        Powerÿ=ÿ0.86

        Nÿ=ÿ750
        BaselineÿIRÿ=ÿ1.42
        IRRÿ=ÿ1.20
        Powerÿ=ÿ0.87

        Nÿ=ÿ800
        BaselineÿIRÿ=ÿ1.42
        IRRÿ=ÿ1.20
        Powerÿ=ÿ0.88

        Nÿ=ÿ850
        BaselineÿIRÿ=ÿ1.42
        IRRÿ=ÿ1.20
        Powerÿ=ÿ0.91

        Nÿ=ÿ900
        BaselineÿIRÿ=ÿ1.42
        IRRÿ=ÿ1.20
        Powerÿ=ÿ0.92

        Nÿ=ÿ950
        BaselineÿIRÿ=ÿ1.42
        IRRÿ=ÿ1.20
        Powerÿ=ÿ0.95

        .ÿ
        .ÿexit

        endÿofÿdo-file


        .

        Comment


        • #5
          Really thank you Joseph for your simulation.
          But........

          even though I don't have a real exposure time, do you think I can use Poisson (or Neg-Binomial)? That is I only look at the same patients and count lymphonodes with two different devices. There is no negative or positive: only count.

          As if i were observing a certain number (the sample size i am looking for) of strawberry fields and counting how many overall blackbirds I see whatching with two different binoculars (I want to demonstrate that a binocular sees 20% more than the other). What's the exposure time???

          Comment


          • #6
            Originally posted by Gianfranco Di Gennaro View Post
            My experiment consists in using the two medical devices on the same subjects.
            Forgot about that,sorry. See below for the simulation reconfigured for a crossover design.

            The crossover setup in the simulation below resembles that used in a conventional 2 × 2 bioequivalence test, for example, randomly allocating patients to sequence of which medical device (A or B) is to be used first. This will help identify (or at least smooth out) so-called learning effects among operators and technologists. If you're using a panel of independent blinded evaluators (reading the scans off-line), then randomizing the sequence in which the medical images are presented to each member of the panel will help maintain the blind. If there's any kind of delay between using the first diagnostic device and the second, then you'll probably want to look at the period effect, too, in case there is any systematic increase in nodal spread due to disease progression in the interim, which will confound detection of increased diagnostic sensitivity. The regression model used below (random effects Poisson with Gaussian random effect) includes both sequence and period as explanatory variables in addition to the type of diagnostic device (trt).

            You have an idea of the baseline incidence rate from prior experience, but because of the nonlinear nature of how the random effect enters the model, you'll need to get a handle on the potential range of the relevant parameter for the random effect's distribution, too. You can see below how it affects sample size estimates and power.

            .ÿ
            .ÿversionÿ16.1

            .ÿ
            .ÿclearÿ*

            .ÿ
            .ÿsetÿseedÿ`=strreverse("1580035")'

            .ÿ
            .ÿprogramÿdefineÿsimem,ÿrclass
            ÿÿ1.ÿÿÿÿÿversionÿ16.1
            ÿÿ2.ÿÿÿÿÿsyntaxÿ,ÿ[Delta(realÿ0.2)ÿn(integerÿ100)ÿSigma2_nu(realÿ1)]
            ÿÿ3.ÿ
            .ÿÿÿÿÿdropÿ_all
            ÿÿ4.ÿÿÿÿÿsetÿobsÿ`n'
            ÿÿ5.ÿ
            .ÿÿÿÿÿgenerateÿbyteÿseqÿ=ÿmod(_n,ÿ2)
            ÿÿ6.ÿ
            .ÿÿÿÿÿgenerateÿdoubleÿmu0ÿ=ÿ85ÿ/ÿ60
            ÿÿ7.ÿÿÿÿÿgenerateÿdoubleÿmu1ÿ=ÿmu0ÿ*ÿ(1ÿ+ÿ`delta')
            ÿÿ8.ÿÿÿÿÿifÿ`sigma2_nu'ÿ==ÿ0ÿgenerateÿbyteÿenuÿ=ÿ1
            ÿÿ9.ÿÿÿÿÿelseÿgenerateÿdoubleÿenuÿ=ÿexp(rnormal(0,ÿsqrt(`sigma2_nu')))
            ÿ10.ÿ
            .ÿÿÿÿÿgenerateÿlongÿpidÿ=ÿ_n
            ÿ11.ÿÿÿÿÿreshapeÿlongÿmu,ÿi(pid)ÿj(trt)
            ÿ12.ÿÿÿÿÿreplaceÿmuÿ=ÿmuÿ*ÿenu
            ÿ13.ÿ
            .ÿÿÿÿÿgenerateÿbyteÿperÿ=ÿcond(seq,ÿ!trt,ÿtrt)
            ÿ14.ÿ
            .ÿÿÿÿÿgenerateÿintÿlymÿ=ÿrpoisson(mu)
            ÿ15.ÿÿÿÿÿ
            .ÿÿÿÿÿxtpoissonÿlymÿi.(seqÿperÿtrt),ÿi(pid)ÿnormalÿirr
            ÿ16.ÿ
            .ÿÿÿÿÿtempnameÿT
            ÿ17.ÿÿÿÿÿmatrixÿdefineÿ`T'ÿ=ÿr(table)
            ÿ18.ÿ
            .ÿÿÿÿÿreturnÿscalarÿbÿ=ÿ`T'[1,ÿ7]
            ÿ19.ÿÿÿÿÿreturnÿscalarÿiÿ=ÿ`T'[1,ÿ6]
            ÿ20.ÿÿÿÿÿreturnÿscalarÿpÿ=ÿ`T'[4,ÿ6]
            ÿ21.ÿend

            .ÿ
            .ÿforvaluesÿVnuÿ=ÿ0(0.5)1ÿ{
            ÿÿ2.ÿÿÿÿ
            .ÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ_newline(2)ÿ"Varianceÿofÿnuÿ=ÿ"ÿasÿresultÿ%04.2fÿ`Vnu'
            ÿÿ3.ÿ
            .ÿÿÿÿÿforvaluesÿnÿ=ÿ200(100)300ÿ{
            ÿÿ4.ÿ
            .ÿÿÿÿÿÿÿÿÿifÿ`Vnu'ÿ==ÿ0ÿlocalÿnÿ=ÿ`n'ÿ+ÿ100
            ÿÿ5.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿelseÿifÿround(`Vnu',ÿ0.1)ÿ==ÿround(0.5,ÿ0.1)ÿlocalÿnÿ=ÿ`n'ÿ+ÿ50
            ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿelseÿ{
            ÿÿ7.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ//ÿNoÿop
            .ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ}
            ÿÿ8.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ_newline(1)ÿ"Nÿ=ÿ"ÿasÿresultÿ`n'
            ÿÿ9.ÿÿÿÿÿ
            .ÿÿÿÿÿÿÿÿÿquietlyÿsimulateÿbÿ=ÿr(b)ÿiÿ=ÿr(i)ÿpÿ=ÿr(p),ÿreps(1000)ÿnodots:ÿsimemÿ,ÿs(`Vnu')ÿn(`n')
            ÿ10.ÿ
            .ÿÿÿÿÿÿÿÿÿsummarizeÿb,ÿmeanonly
            ÿ11.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Baselineÿincidenceÿrateÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
            ÿ12.ÿ
            .ÿÿÿÿÿÿÿÿÿsummarizeÿi,ÿmeanonly
            ÿ13.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"IRRÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
            ÿ14.ÿ
            .ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿposÿ=ÿpÿ<ÿ0.05
            ÿ15.ÿÿÿÿÿÿÿÿÿsummarizeÿpos,ÿmeanonly
            ÿ16.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Powerÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
            ÿ17.ÿÿÿÿÿ}
            ÿ18.ÿ}


            Varianceÿofÿnuÿ=ÿ0.00

            Nÿ=ÿ300
            Baselineÿincidenceÿrateÿ=ÿ1.41
            IRRÿ=ÿ1.20
            Powerÿ=ÿ0.79

            Nÿ=ÿ400
            Baselineÿincidenceÿrateÿ=ÿ1.41
            IRRÿ=ÿ1.20
            Powerÿ=ÿ0.89


            Varianceÿofÿnuÿ=ÿ0.50

            Nÿ=ÿ250
            Baselineÿincidenceÿrateÿ=ÿ1.42
            IRRÿ=ÿ1.20
            Powerÿ=ÿ0.81

            Nÿ=ÿ350
            Baselineÿincidenceÿrateÿ=ÿ1.42
            IRRÿ=ÿ1.20
            Powerÿ=ÿ0.92


            Varianceÿofÿnuÿ=ÿ1.00

            Nÿ=ÿ200
            Baselineÿincidenceÿrateÿ=ÿ1.43
            IRRÿ=ÿ1.20
            Powerÿ=ÿ0.83

            Nÿ=ÿ300
            Baselineÿincidenceÿrateÿ=ÿ1.42
            IRRÿ=ÿ1.20
            Powerÿ=ÿ0.95

            .ÿ
            .ÿexit

            endÿofÿdo-file


            .

            Comment


            • #7
              Originally posted by Gianfranco Di Gennaro View Post
              But........

              even though I don't have a real exposure time, do you think I can use Poisson (or Neg-Binomial)? That is I only look at the same patients and count lymphonodes with two different devices.
              This is not how I've seen the effectiveness of diagnostic medical devices being evaluated, at least in a government-regulated context. Rather than going off on what is analogous to exposure time, the tack is typically to set a threshold that is medically important (that is, a count threshold, or pattern or location of affected lymph nodes that makes a difference in prognosis or treatment planning), and then use the binary outcome to define positive or negative diagnostic finding with the experimental medical device for comparison of its sensitivity and specificity against a generally accepted reference standard. Depending upon where in the diagnostic workup the experimental medical device is intended to be used, either specificity or sensitivity might be emphasized in the evaluation of the medical device's effectiveness.

              Comment


              • #8
                Dear Joseph, thank you very very much.
                Yours is a huge help.
                Gianfranco

                Comment

                Working...
                X