Sample Size in Count analysis

Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 139
#1

Sample Size in Count analysis

02 Nov 2020, 09:55

Hello everybody and thank you in advance.

I would like to ask you about a power analysis problem starting from an example.

I have two medical devices (A and B) that finds lymphnodes.

My experiment consists in using the two medical devices on the same subjects.

The goal of the experiment is to demonstrate that device A finds 20% more lymphnodes than device B.

From a previous study I know that device A on about 60 subjects (of the same population, obviously) identifies 85 lymph nodes.

I wonder if it is possible to consider lymphnodes count as a Poisson (or negative binomial) variable, and to consider the single subject as a unit of time.
In this way I would find myself constructing the sample size considering a Poisson regression coefficient of 1.2 as the effect size.

What do you think? Could it be a correct strategy?

Thanks again.

Last edited by Gianfranco Di Gennaro; 02 Nov 2020, 09:57.
Tags: count, poisson, samplesize
Carlo Lazzaro

Join Date: Apr 2014

Posts: 17701
#2

02 Nov 2020, 11:16

Gianfranco:
welcome to this forum.
See -help power-.

Kind regards,
Carlo
(Stata 19.0)
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 139
#3

02 Nov 2020, 14:01

Thank you very much.
Anyway, I’m not sure I will find examples like the one I described above, but I’ll try.
Thanks again
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4398
#4

02 Nov 2020, 19:58

Originally posted by Gianfranco Di Gennaro View Post

I wonder if it is possible to consider lymphnodes count as a Poisson (or negative binomial) variable, and to consider the single subject as a unit of time.
In this way I would find myself constructing the sample size considering a Poisson regression coefficient of 1.2 as the effect size.

It's certainly possible, but wouldn't it be something to be guided by subject matter knowledge? There is a boatload of statistical models for count outcomes beyond what you've considered—see for example here and here. What does the literature have to say about the distribution of lymph nodes with a positive diagnostic finding in your population of patients?

I'm not sure whether Stata's power suite of commands handles count response data, but you could use simulation for power analysis, and although I've forgotten its name I believe that there is a user-written command up on SSC to assist in this when you're considering generalized linear models.

Not necessarily to vouch for it, but if what's below is any indication, then you're not going to like what you see with your assumed baseline incidence rate and with an IRR of 1.2 taken as the detected effect size.

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1579985")'

.ÿ
.ÿprogramÿdefineÿsimem,ÿrclass
ÿÿ1.ÿÿÿÿÿÿÿÿÿversionÿ16.1
ÿÿ2.ÿÿÿÿÿÿÿÿÿsyntaxÿ,ÿ[Delta(realÿ0.2)ÿn(integerÿ100)]
ÿÿ3.ÿ
.ÿÿÿÿÿÿÿÿÿdropÿ_all
ÿÿ4.ÿÿÿÿÿÿÿÿÿquietlyÿsetÿobsÿ`=round(`n',ÿ2)'
ÿÿ5.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿtrtÿ=ÿmod(_n,ÿ2)
ÿÿ6.ÿÿÿÿÿÿÿÿÿgenerateÿdoubleÿmuÿ=ÿ85ÿ/ÿ60ÿ*ÿ(ÿ1ÿ+ÿ`delta'ÿ*ÿtrt)
ÿÿ7.ÿÿÿÿÿÿÿÿÿgenerateÿintÿlynÿ=ÿrpoisson(mu)
ÿÿ8.ÿ
.ÿÿÿÿÿÿÿÿÿpoissonÿlynÿi.trt,ÿirr
ÿÿ9.ÿÿÿÿÿÿÿÿÿtempnameÿT
ÿ10.ÿÿÿÿÿÿÿÿÿmatrixÿdefineÿ`T'ÿ=ÿr(table)
ÿ11.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿpÿ=ÿ`T'[4,ÿ2]
ÿ12.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿiÿ=ÿ`T'[1,ÿ2]
ÿ13.ÿÿÿÿÿÿÿÿÿreturnÿscalarÿbÿ=ÿ`T'[1,ÿ3]
ÿ14.ÿend

.ÿ
.ÿforvaluesÿnÿ=ÿ650(50)950ÿ{
ÿÿ2.ÿÿÿÿÿquietlyÿsimulateÿpÿ=ÿr(p)ÿiÿ=ÿr(i)ÿbÿ=ÿr(b),ÿreps(3000)ÿnodots:ÿsimemÿ,ÿn(`n')
ÿÿ3.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ_newline(1)ÿ"Nÿ=ÿ"ÿasÿresultÿ`n'
ÿÿ4.ÿ
.ÿÿÿÿÿÿÿÿÿsummarizeÿb,ÿmeanonly
ÿÿ5.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"BaselineÿIRÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿÿ6.ÿ
.ÿÿÿÿÿÿÿÿÿsummarizeÿi,ÿmeanonly
ÿÿ7.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"IRRÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿÿ8.ÿ
.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿposÿ=ÿpÿ<ÿ0.05
ÿÿ9.ÿÿÿÿÿÿÿÿÿsummarizeÿpos,ÿmeanonly
ÿ10.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Powerÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿ11.ÿ}

Nÿ=ÿ650
BaselineÿIRÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.82

Nÿ=ÿ700
BaselineÿIRÿ=ÿ1.41
IRRÿ=ÿ1.20
Powerÿ=ÿ0.86

Nÿ=ÿ750
BaselineÿIRÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.87

Nÿ=ÿ800
BaselineÿIRÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.88

Nÿ=ÿ850
BaselineÿIRÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.91

Nÿ=ÿ900
BaselineÿIRÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.92

Nÿ=ÿ950
BaselineÿIRÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.95

.ÿ
.ÿexit

endÿofÿdo-file

.
1 like
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 139
#5

03 Nov 2020, 05:47

Really thank you Joseph for your simulation.
But........

even though I don't have a real exposure time, do you think I can use Poisson (or Neg-Binomial)? That is I only look at the same patients and count lymphonodes with two different devices. There is no negative or positive: only count.

As if i were observing a certain number (the sample size i am looking for) of strawberry fields and counting how many overall blackbirds I see whatching with two different binoculars (I want to demonstrate that a binocular sees 20% more than the other). What's the exposure time???
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4398
#6

03 Nov 2020, 05:49

Originally posted by Gianfranco Di Gennaro View Post

My experiment consists in using the two medical devices on the same subjects.

Forgot about that,sorry. See below for the simulation reconfigured for a crossover design.

The crossover setup in the simulation below resembles that used in a conventional 2 × 2 bioequivalence test, for example, randomly allocating patients to sequence of which medical device (A or B) is to be used first. This will help identify (or at least smooth out) so-called learning effects among operators and technologists. If you're using a panel of independent blinded evaluators (reading the scans off-line), then randomizing the sequence in which the medical images are presented to each member of the panel will help maintain the blind. If there's any kind of delay between using the first diagnostic device and the second, then you'll probably want to look at the period effect, too, in case there is any systematic increase in nodal spread due to disease progression in the interim, which will confound detection of increased diagnostic sensitivity. The regression model used below (random effects Poisson with Gaussian random effect) includes both sequence and period as explanatory variables in addition to the type of diagnostic device (trt).

You have an idea of the baseline incidence rate from prior experience, but because of the nonlinear nature of how the random effect enters the model, you'll need to get a handle on the potential range of the relevant parameter for the random effect's distribution, too. You can see below how it affects sample size estimates and power.

.ÿ
.ÿversionÿ16.1

.ÿ
.ÿclearÿ*

.ÿ
.ÿsetÿseedÿ`=strreverse("1580035")'

.ÿ
.ÿprogramÿdefineÿsimem,ÿrclass
ÿÿ1.ÿÿÿÿÿversionÿ16.1
ÿÿ2.ÿÿÿÿÿsyntaxÿ,ÿ[Delta(realÿ0.2)ÿn(integerÿ100)ÿSigma2_nu(realÿ1)]
ÿÿ3.ÿ
.ÿÿÿÿÿdropÿ_all
ÿÿ4.ÿÿÿÿÿsetÿobsÿ`n'
ÿÿ5.ÿ
.ÿÿÿÿÿgenerateÿbyteÿseqÿ=ÿmod(_n,ÿ2)
ÿÿ6.ÿ
.ÿÿÿÿÿgenerateÿdoubleÿmu0ÿ=ÿ85ÿ/ÿ60
ÿÿ7.ÿÿÿÿÿgenerateÿdoubleÿmu1ÿ=ÿmu0ÿ*ÿ(1ÿ+ÿ`delta')
ÿÿ8.ÿÿÿÿÿifÿ`sigma2_nu'ÿ==ÿ0ÿgenerateÿbyteÿenuÿ=ÿ1
ÿÿ9.ÿÿÿÿÿelseÿgenerateÿdoubleÿenuÿ=ÿexp(rnormal(0,ÿsqrt(`sigma2_nu')))
ÿ10.ÿ
.ÿÿÿÿÿgenerateÿlongÿpidÿ=ÿ_n
ÿ11.ÿÿÿÿÿreshapeÿlongÿmu,ÿi(pid)ÿj(trt)
ÿ12.ÿÿÿÿÿreplaceÿmuÿ=ÿmuÿ*ÿenu
ÿ13.ÿ
.ÿÿÿÿÿgenerateÿbyteÿperÿ=ÿcond(seq,ÿ!trt,ÿtrt)
ÿ14.ÿ
.ÿÿÿÿÿgenerateÿintÿlymÿ=ÿrpoisson(mu)
ÿ15.ÿÿÿÿÿ
.ÿÿÿÿÿxtpoissonÿlymÿi.(seqÿperÿtrt),ÿi(pid)ÿnormalÿirr
ÿ16.ÿ
.ÿÿÿÿÿtempnameÿT
ÿ17.ÿÿÿÿÿmatrixÿdefineÿ`T'ÿ=ÿr(table)
ÿ18.ÿ
.ÿÿÿÿÿreturnÿscalarÿbÿ=ÿ`T'[1,ÿ7]
ÿ19.ÿÿÿÿÿreturnÿscalarÿiÿ=ÿ`T'[1,ÿ6]
ÿ20.ÿÿÿÿÿreturnÿscalarÿpÿ=ÿ`T'[4,ÿ6]
ÿ21.ÿend

.ÿ
.ÿforvaluesÿVnuÿ=ÿ0(0.5)1ÿ{
ÿÿ2.ÿÿÿÿ
.ÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ_newline(2)ÿ"Varianceÿofÿnuÿ=ÿ"ÿasÿresultÿ%04.2fÿ`Vnu'
ÿÿ3.ÿ
.ÿÿÿÿÿforvaluesÿnÿ=ÿ200(100)300ÿ{
ÿÿ4.ÿ
.ÿÿÿÿÿÿÿÿÿifÿ`Vnu'ÿ==ÿ0ÿlocalÿnÿ=ÿ`n'ÿ+ÿ100
ÿÿ5.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿelseÿifÿround(`Vnu',ÿ0.1)ÿ==ÿround(0.5,ÿ0.1)ÿlocalÿnÿ=ÿ`n'ÿ+ÿ50
ÿÿ6.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿelseÿ{
ÿÿ7.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ//ÿNoÿop
.ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ}
ÿÿ8.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ_newline(1)ÿ"Nÿ=ÿ"ÿasÿresultÿ`n'
ÿÿ9.ÿÿÿÿÿ
.ÿÿÿÿÿÿÿÿÿquietlyÿsimulateÿbÿ=ÿr(b)ÿiÿ=ÿr(i)ÿpÿ=ÿr(p),ÿreps(1000)ÿnodots:ÿsimemÿ,ÿs(`Vnu')ÿn(`n')
ÿ10.ÿ
.ÿÿÿÿÿÿÿÿÿsummarizeÿb,ÿmeanonly
ÿ11.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Baselineÿincidenceÿrateÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿ12.ÿ
.ÿÿÿÿÿÿÿÿÿsummarizeÿi,ÿmeanonly
ÿ13.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"IRRÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿ14.ÿ
.ÿÿÿÿÿÿÿÿÿgenerateÿbyteÿposÿ=ÿpÿ<ÿ0.05
ÿ15.ÿÿÿÿÿÿÿÿÿsummarizeÿpos,ÿmeanonly
ÿ16.ÿÿÿÿÿÿÿÿÿdisplayÿinÿsmclÿasÿtextÿ"Powerÿ=ÿ"ÿasÿresultÿ%04.2fÿr(mean)
ÿ17.ÿÿÿÿÿ}
ÿ18.ÿ}

Varianceÿofÿnuÿ=ÿ0.00

Nÿ=ÿ300
Baselineÿincidenceÿrateÿ=ÿ1.41
IRRÿ=ÿ1.20
Powerÿ=ÿ0.79

Nÿ=ÿ400
Baselineÿincidenceÿrateÿ=ÿ1.41
IRRÿ=ÿ1.20
Powerÿ=ÿ0.89

Varianceÿofÿnuÿ=ÿ0.50

Nÿ=ÿ250
Baselineÿincidenceÿrateÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.81

Nÿ=ÿ350
Baselineÿincidenceÿrateÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.92

Varianceÿofÿnuÿ=ÿ1.00

Nÿ=ÿ200
Baselineÿincidenceÿrateÿ=ÿ1.43
IRRÿ=ÿ1.20
Powerÿ=ÿ0.83

Nÿ=ÿ300
Baselineÿincidenceÿrateÿ=ÿ1.42
IRRÿ=ÿ1.20
Powerÿ=ÿ0.95

.ÿ
.ÿexit

endÿofÿdo-file

.
Comment
Joseph Coveney

Join Date: Apr 2014

Posts: 4398
#7

03 Nov 2020, 06:13

Originally posted by Gianfranco Di Gennaro View Post

But........

even though I don't have a real exposure time, do you think I can use Poisson (or Neg-Binomial)? That is I only look at the same patients and count lymphonodes with two different devices.

This is not how I've seen the effectiveness of diagnostic medical devices being evaluated, at least in a government-regulated context. Rather than going off on what is analogous to exposure time, the tack is typically to set a threshold that is medically important (that is, a count threshold, or pattern or location of affected lymph nodes that makes a difference in prognosis or treatment planning), and then use the binary outcome to define positive or negative diagnostic finding with the experimental medical device for comparison of its sensitivity and specificity against a generally accepted reference standard. Depending upon where in the diagnostic workup the experimental medical device is intended to be used, either specificity or sensitivity might be emphasized in the evaluation of the medical device's effectiveness.
1 like
Comment
Gianfranco Di Gennaro

Join Date: Oct 2020

Posts: 139
#8

03 Nov 2020, 09:02

Dear Joseph, thank you very very much.
Yours is a huge help.
Gianfranco
Comment

Announcement

Sample Size in Count analysis

Comment

Comment

Comment

Comment

Comment

Comment

Comment