This week, we tackle how, when testing multiple hypotheses, we need to be careful about the multiple comparisons problem due to randomness in the hypothesis evaluation procedure. False discovery rate (FDR) control provides a new framework for what we should optimize for in this setting, while controlling the error incurred from multiplicity.
Martingale proof of Benjamini-Hochberg (BH) from Storey, Taylor, and Sigmund 2004 (for this reading, set and only focus on Theorems 1 and 2)
- Classical statistical control for false discoveries, such as controlling family-wise error rate, demands limiting the average count of false discoveries in an experimental procedure.
- Strategies for FWER like Holm-Bonferroni are very conservative, but by changing the objective from a count to a rate, namely, that the proportion of false discoveries among all discoveries made should be small, Benjamini and Hochberg were able to find a more powerful method.
- Since the kind of control differs, but is still an interpretable guarantee which may be useful to experimenters, it's important to keep FDR in mind when looking for discoveries in more lenient settings.
STS Theorem 1. Mean-based estimator upper bounds FDR.
- Suppose we rejected p-values at a constant level . For hypotheses we'd expect rejections at most (if they're all null). Then simply dividing this average by , the number of rejected hypotheses with p-values less than , bounds the actual average FDR , where is the true number of rejected nulls, by a convexity argument and Jensen's inequality.
- Based on the estimator above , one reasonable guess for an adaptive rejection rate that controls FDR at level would be , since we're controlling an upper bound.
- By a visual proof (see the notes), rejecting at level is equivalent to the usual BH algorithm.
STS Theorem 2. BH actually controls FDR when p-values are independent.
- This isn't trivial because Theorem 1 only holds for nonrandom .
- The ratio between true type-I errors and their average is a martingale in reverse time as falls from 1 to 0 due to the independence mentioned above. As a result, by Optional Stopping Theorem .
- Noticing that we can rewrite the true
since the first term is uniformly dominated by by construction and the second by 1, we have control!
BH Martingale Notes.pdf
Definition of a stopping time is that not
If you like applying these kinds of methods practical ML problems, join our team.