Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • Struggled to identify number of tests and how to adjust for multiple testing?

    Dear Statalist Members,

    I searched and read multiple resources regarding multiple testing or comparisons. In principle, I need to adjust for multiple testing when exists. Yet, in specific situations, I'm still not sure to identify and adjust for multiplicity. So, I highly appreciate if any of our members could help on two questions below. Once we understood and identified the number of tests and how to adjust, then we can use Stata to perform adjustment.

    1. Which scenario we need to adjust for multiplicity and if number of tests in column 3 is correct? I listed here a table with 3 columns, 1, 2, and 3 representing the scenario, detailed description and number of tests, respectively as attached
    Click image for larger version

Name:	Table_Multiplicity.png
Views:	1
Size:	124.1 KB
ID:	1692042
    or below:
    Scenario
    (1)
    Descriptions
    (2)
    # of tests (or comparisons)
    (3)
    1 We have 2 variables
    • Y as outcome and
    • X as type of intervention (1, 2, 3) and
    The goal is to compare Y between each intervention (2, 3) with 1 (as control)
    2?
    2 We have:
    • 1 outcome, Y
    • 20 exposure variables, X1, X2, X3, ..., X20
    Our goal is to test how many of these exposure variables associated with outcome Y.
    20?
    3 We have:
    • 20 outcomes, Y1, Y2, Y3, ... , Y20
    • 1 exposure variable, X
    Our goal is to test if X is associated with each of these 20 outcomes.
    20?
    4 We have:
    • 1 outcome, Y
    • 1 exposure variable, X
    • 4 covariates or confounders, C1, C2, C3, & C4
    • However, we will use 3 types or methods of statistical model to test association of X and Y, adjusting for 4 confounders
    Our goal is to test if X is associated with Y.
    3 because we use 3 statistical methods?
    5 We have:
    • 1 outcome, Y, measured at baseline (Y0), and repeated measure at 3 timepoints, Y1, Y2, Y3), so in total, we have 4 values of outcome at 4 timepoints for each participant.
    Our goal is to test if each repeated measurement (Y1, Y2, Y3) is different from baseline (Y0).
    3?
    6 We have:
    • 1 outcome, Y
    • 2 exposure variables, X1, X2
    Our goal is to test if X1, X2 and their interaction are associated with Y.
    3 because we test 2 exposure variables and 1 interaction?

    2. How to estimate FDR (false discovery rate) from raw p-values? I searched and found a source: http://www.biostathandbook.com/multiplecomparisons.html guiding that BH-adjusted FDR is calculated as raw p-value multiplied by the number of test divided by order of the raw p-value. However, it seems there is something not logical from this estimation. For example, suppose I have 5 tests, the raw p-values and adjusted p-value as per FDR are presented below:
    raw p value Rank p-value FDR adjusted p
    0.01 1 0.05
    0.012 2 0.03
    0.06 3 0.1
    0.3 4 0.375
    0.5 5 0.5
    The question here is why the first raw p-value is 0.01 which is smaller than the second, 0.012, but FDR-adjusted p of the first (0.050) is higher than the second (0.030)? So, if we set FDR = 0.05, only the second p-value is significant, while the first, smaller raw p, is not? So, what is correct formula of FDR estimation?

    I thank so much and look forward to receiving your help.
Working...
X