Announcement

Collapse
No announcement yet.
X
  • Filter
  • Time
  • Show
Clear All
new posts

  • adjusting for multiple testing in logistic regression models?

    Click image for larger version

Name:	PastedGraphic-1.jpg
Views:	1
Size:	49.5 KB
ID:	1458177

    Hi all,

    I am unsure about whether I need to adjust for multiple testing (i.e. changing to 99% confidence intervals) in my logistic regression models.

    I am unsure what ‘counts’ as multiple testing. I have many logistic regression models (around 10) each on different slices of data - I don’t think that in itself counts as multiple testing?

    Within each model - is it the total number of variables that counts as multiple testing?

    Or is it the number of ‘levels’ within a category (e.g. age split up into 5 categories) and the number of comparisons you make between them?

    Just to give you an idea of my models. This is one of them, and I have about 10 of these (they are individual models in their own right looking at different slices of the data so i don’t compare the models with each other). I am primarily looking at the impact of life events (financial and social events in the below example) on depression outcome whilst CONTROLLING for all other variables in the model i.e. I am just interested in how the estimates of the life events change when all the variables are included in the model.

    Thank you!!

  • #2
    You probably will not like my answer. Hopefully others, with differing points of view, will also respond.

    This is the soft underbelly of significance testing. Your question actually has no answer that anybody can rigorously defend, although many people will offer you many different opinions.

    Historically, correcting the p-values for multiple tests began with post-hoc testing following ANOVA. You would get a significant F-test for some predictor variable that had several levels, and then you would run a bunch of tests contrasting the different levels with each other, "to see where the difference comes from." Everyone realized that the experiment-wide Type I error rate explodes well past the nominal .05 level as you increases the number of comparisons you tested. So various approaches, Bonferroni, Scheffe to name just two, to correcting for multiple testing and preserving the experiment-wide Type I error rate at 0.05 were developed.

    But in principle you have the same problem with multiple models fitted to the same data set. And you have the same problem with the same model applied to different subsets of the data. The Type I error rate explodes. In theory, you should correct for the number of tests, counting every single test you have done along the way, including ones whose results you choose not to report. In practice, this injunction is mainly honored in the breach because the number of tests done is usually so large that seldom would anybody be able to declare a statistically significant result in any study, and often so many tests have been done that everybody involved has lost count. And what if a colleague and I are both analyzing the data? Each of us should also correct for the tests done by the other, even though we might not know about all of them. But also we should not double-correct for any tests that we have both done. The whole protocol is unwieldy. The sad truth is that people tend to wing it, with the consequence that there are no consistently followed guidelines, and nobody really knows what to make of them. Some would say that the "reproducibility crisis" in science is, in large part, also a consequence of this.

    My own practice, unless forced to do otherwise by reviewers, is to report nominal ("uncorrected") p-values and to clearly disclaim that they have not been "corrected" for multiple-hypothesis testing. I leave it to the reader to make of that what he or she will. I think it is better to lay my ignorance bare rather than pretend to wisdom. As a reader and reviewer, in most circumstances, I don't take p-values very seriously for this same reason, among others. I much prefer looking at effect sizes and estimates of their precision (typically confidence intervals) to hypothesis tests in most settings.

    Things are somewhat better, in my opinion, if you are reporting the results of a pre-registered study. In that case, you report as results all and only those tests that were included in the pre-registered protocol, and you can correct for exactly that number of tests. If you did other tests on top of that, you don't report them as results, though if they are interesting, you might mention them in footnotes or appendices or other ways that make clear that they are not actual results and have no standing beyond preliminary, exploratory findings in need of confirmation in a separate study.

    Comment

    Working...
    X