Dear Statalist Members,
I searched and read multiple resources regarding multiple testing or comparisons. In principle, I need to adjust for multiple testing when exists. Yet, in specific situations, I'm still not sure to identify and adjust for multiplicity. So, I highly appreciate if any of our members could help on two questions below. Once we understood and identified the number of tests and how to adjust, then we can use Stata to perform adjustment.
1. Which scenario we need to adjust for multiplicity and if number of tests in column 3 is correct? I listed here a table with 3 columns, 1, 2, and 3 representing the scenario, detailed description and number of tests, respectively as attached
or below:
2. How to estimate FDR (false discovery rate) from raw p-values? I searched and found a source: http://www.biostathandbook.com/multiplecomparisons.html guiding that BH-adjusted FDR is calculated as raw p-value multiplied by the number of test divided by order of the raw p-value. However, it seems there is something not logical from this estimation. For example, suppose I have 5 tests, the raw p-values and adjusted p-value as per FDR are presented below:
The question here is why the first raw p-value is 0.01 which is smaller than the second, 0.012, but FDR-adjusted p of the first (0.050) is higher than the second (0.030)? So, if we set FDR = 0.05, only the second p-value is significant, while the first, smaller raw p, is not? So, what is correct formula of FDR estimation?
I thank so much and look forward to receiving your help.
I searched and read multiple resources regarding multiple testing or comparisons. In principle, I need to adjust for multiple testing when exists. Yet, in specific situations, I'm still not sure to identify and adjust for multiplicity. So, I highly appreciate if any of our members could help on two questions below. Once we understood and identified the number of tests and how to adjust, then we can use Stata to perform adjustment.
1. Which scenario we need to adjust for multiplicity and if number of tests in column 3 is correct? I listed here a table with 3 columns, 1, 2, and 3 representing the scenario, detailed description and number of tests, respectively as attached
Scenario (1) |
Descriptions (2) |
# of tests (or comparisons) (3) |
1 | We have 2 variables
|
2? |
2 | We have:
|
20? |
3 | We have:
|
20? |
4 | We have:
|
3 because we use 3 statistical methods? |
5 | We have:
|
3? |
6 | We have:
|
3 because we test 2 exposure variables and 1 interaction? |
2. How to estimate FDR (false discovery rate) from raw p-values? I searched and found a source: http://www.biostathandbook.com/multiplecomparisons.html guiding that BH-adjusted FDR is calculated as raw p-value multiplied by the number of test divided by order of the raw p-value. However, it seems there is something not logical from this estimation. For example, suppose I have 5 tests, the raw p-values and adjusted p-value as per FDR are presented below:
raw p value | Rank p-value | FDR adjusted p |
0.01 | 1 | 0.05 |
0.012 | 2 | 0.03 |
0.06 | 3 | 0.1 |
0.3 | 4 | 0.375 |
0.5 | 5 | 0.5 |
I thank so much and look forward to receiving your help.