Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Sample size calculation taking into account contamination of exposure variable

    Hi guys, just learning the ropes with STATA over the last few months but I've hit a stumbling block.

    I want to make two different sample size calculations:
    • first one relates to a cohort study endpoint which is binary
      • I know the the outcome proportion in the control group, the difference I want to detect, the alpha and the desired power
      • power twoproportions x y, test(chi2)
    • second one relates to a cohort study endpoint which is continuous
      • I know the the outcome mean in the control group (and the standard deviation), the difference I want to detect, the alpha and the desired power
      • power twomeans x y, sd(z)
    HOWEVER, the exposure is not "clean" - there is contamination, whereby 10% of the "exposed" are actually unexposed, and vice versa (i.e. the exposure has sensitivity 90% and specificity 90%)

    How does this factor in to my sample size calculation?

    I can obviously foresee it will increase the sample size needed to acheive a given power, but how do I exactly calculate this in STATA?

  • #2
    Assuming that the exposure misclassification is random, then you can reason as follows. The "exposed" group is actually a 90-10 mixture of unexposed and exposed, so the expected mean outcome in the exposed group is not x but is 0.9*x + 0.1*y. Similarly the expected outcome in the "unexposed" group is actually 0.9*y + 0.1*x. So you can use those instead of x and y as the expected outcomes in your power calculation. Since these will be closer to each other than x and y are, your required sample size for a given power will be higher.

    Now, if the misclassification is systematic (i.e. is associated at least in part with the outcome) then you have a bias problem that is, in my view, more important than power issues, and is very difficult to resolve.

    Comment


    • #3
      Thanks, I'm talking about non-differential misclassification (i.e. not systematic) so I think what you initially said should hold...

      Thanks so much

      Comment


      • #4
        Wait a minute,

        Thinking this through - doesn't your answer only apply to continuous outcome variables?

        How do I address this with binary outcome variables (as proportions)?

        See my initial post - I have two separate calculations to adjust for...

        Comment


        • #5
          It works exactly the same way with dichotomous outcomes. If the "clean" outcome probabilities were, say .3 and .5, the expect outcome probabilities would be calculated by the same formulas, and would come out .32 and .48, respectively.

          Comment


          • #6
            Thanks again for your response - I really appreciate it.

            I'm sorry to struggle with this issue repeatedly, bit embarassing.

            I'm learning this with some online lectures I've found - but I can't seem to make your approach get the correct numbers in this table:

            Click image for larger version

Name:	Untitled.jpg
Views:	1
Size:	122.2 KB
ID:	1366461


            I've applied your approach to some sample data (data with OR = 2, like in this slide) but I don't get the same "adjusted" OR... can you explain how you do it?

            I don't have access to the book reference on the slide

            Comment


            • #7
              You're asking a different question here. Your original post was about how you would go about calculating sample size for given expected probabilities. What you ask now is about the effect of this contamination on the observed odds ratio. That is a more long-winded calculation. To be honest, even if I had the patience to write it out on paper today, carrying over all those equations into the forum would be an even longer task. The algebra involved is not, in principle, deep or difficult. But the equations are lengthy and full of fractions and more fractions.

              You might find http://www.scielosp.org/pdf/rbepid/v...8-02-00341.pdf helpful. It sets out some of the simpler part of the analysis in algebraic formulas, though it eventually goes to computer generated results without a full exposition of the calculations (probably for the same reasons I mentioned in the above paragraph).

              Comment


              • #8
                Ah, that link has been helpful.

                Sorry, I did skip from one issue to another without explaining myself. Thanks for your help though - really appreciate it.

                Comment

                Working...
                X