Benjamini Hochberg

pierre martin

Join Date: Nov 2017

Posts: 72
#1

Benjamini Hochberg

29 Apr 2026, 07:14

Hi all,

I have built the following linear mixed model to understand the association of accelerated biological aging (grimAge) with pulmonary function assessed annually. Instead of using time as the time-scale, I used chronological age incremented by 1 year at each annual visit.

mixed pulmonary c.chronologicalage##c.chronologicalage##c.grimage i.sexe || id: chronologicalage, residuals(ar 1, t(chronologicalage))

The coefficients and p values I presented in my paper are the ones for the interaction terms [grimage * chronological age] and then [grimage * squared chronological age].

I did similar models with the other existing biological aging clocks so basically same thing but instead of grim age biological age, it was the biological age provided by another clock.

Given that the models are repeated 6 times (I used 6 different biological aging clocks), I am asked to correct with Benjamini Hochberg.

I completely understand how it works if I only had one beta coefficient and a p value but here given that for each clock I presented a beta (and p value) for the interaction term with chronological age and then for the interaction term with chronological age squared, I am struggling with providing the corrected p values with Benjamini Hochberg.

Any advice would be extremely helpful to me. Thank you very much for your time.
Best, Pierre
Tags: None
Daniel Schaefer

Join Date: Mar 2020

Posts: 842
#2

29 Apr 2026, 11:48

I'm not really familiar with the procedure, but maybe I can help you think this through regardless? The Benjamini Hochberg procedure was first described here. The procedure corrects for the multiple comparisons problem, which happens when multiple hypothesis tests are estimated on the same dataset. Multiple simultaneous tests will increase the false positive rate when you interpret the p-values. The Benjamini Hockberg procedure (described here and here) is therefore related to the Bonferroni correction.

You say you understand the procedure when you have one coefficient and p-value (I assume you mean one coefficient across multiple models), but you are not sure how to preform the correction with the quadratic and interaction terms. It's not clear what the issue is from your perspective, but in my mind there are two potential issues.

(1) Should the quadratic term and the interaction term both be included in the adjustment procedure, or should they be handled separately across models? Fundamentally, this question relates to what constitutes a family of related tests. The definition is a little vague, but my reading is that you should include all the coefficients of interest across all of the models. It's actually a little surprising to me that the reviewer suggest doing the procedure across models rather than strictly within models, not because it isn't typically done (from what I can tell, it is), but because the original paper assumes variables are independent at the top of section 4.1. You make the same assumption of independence within a regression, but not necessarily across regression models with the same dependent variable and overlap in the set of independent variables. This raises another issue.

(2) How should one deal with the multicollinearity between the first-order, quadratic, and interaction terms? This seems like a more substantial problem. Benjamini Hockberg assumes that variables are not positively related. You can find a great thread discussing this here, but in short you may need another adjustment described in this paper.

Keep in mind, the lower type 1 error rate comes at the cost of an increased type 2 error rate, and you are already adjusting for the clustered data structure with your mixed model. I think your choice here with respect to adjustments should be driven by the substantive scientific and social costs here. Are there substantive reasons why it is essential that you do not mistakenly conclude an effect exists when it does not? Are the benefits worth the risk that you might misidentify important predictors as not significant?
1 like
Comment

George Ford

Join Date: Aug 2014
Posts: 3337

30 Apr 2026, 10:22

might be useful.

Code:

clear all
set seed 12345

sysuse auto, clear

matrix P = J(6,1,.)

forv i = 1/6 {
    quietly reg mpg c.weight##c.weight if runiform() < 0.9
    quietly margins, dydx(weight)
    matrix P[`i',1] = r(table)[4,1]
}

mat li P

clear
svmat P, names(p)
gen test = _n

qqvalue p1, method(holland) qvalue(q_bh)

list test p1 q_bh, noobs

sort p1
gen rank = _n
gen crit = rank/_N * 0.05   // alpha = 0.05

list p1 crit

Comment

pierre martin

Join Date: Nov 2017

Posts: 72
#4

03 May 2026, 15:43

Hi all, thank you for your comments.

I am still wondering with what I did below if I should consider 12 p values to correct with Benjamini Hochberg (6 models = different biological aging clocks) with 1 coefficient interaction biological aging * chronological age and one coefficient interaction biological aging * squared chronological age ? (this is how I presented my findings) OR if I should calculate for each model one p value summarizing both interactions with linear and quadratic age and then correct with Benjamini Hochberg only 6 p values (one for each model = biological aging clock considered).

Do you have any advice? Below is what I did. Thank you all!

Best, Pierre

--

I have built the following linear mixed model to understand the association of accelerated biological aging (grimAge) with pulmonary function assessed annually. Instead of using time as the time-scale, I used chronological age incremented by 1 year at each annual visit.

mixed pulmonary c.chronologicalage##c.chronologicalage##c.grimage i.sexe || id: chronologicalage, residuals(ar 1, t(chronologicalage))

The coefficients and p values I presented in my paper are the ones for the interaction terms [grimage * chronological age] and then [grimage * squared chronological age].

I did similar models with the other existing biological aging clocks so basically same thing but instead of grim age biological age, it was the biological age provided by another clock.

Given that the models are repeated 6 times (I used 6 different biological aging clocks), I am asked to correct with Benjamini Hochberg.

I completely understand how it works if I only had one beta coefficient and a p value but here given that for each clock I presented a beta (and p value) for the interaction term with chronological age and then for the interaction term with chronological age squared, I am struggling with providing the corrected p values with Benjamini Hochberg.

Any advice would be extremely helpful to me. Thank you very much for your time.
Best, Pierre
Comment

Announcement

Benjamini Hochberg

Comment

Comment

Comment