Statistical Inference

Benjamini-Hochberg | August 17, 2020

By Vlad Feinberg - August 17, 2020

This week, we tackle how, when testing multiple hypotheses, we need to be careful about the multiple comparisons problem due to randomness in the hypothesis evaluation procedure. False discovery rate (FDR) control provides a new framework for what we should optimize for in this setting, while controlling the error incurred from multiplicity.


Martingale proof of Benjamini-Hochberg (BH) from Storey, Taylor, and Sigmund 2004 (for this reading, set λ=0\lambda=0 and only focus on Theorems 1 and 2)

Why Benjamini-Hochberg?

  • Classical statistical control for false discoveries, such as controlling family-wise error rate, demands limiting the average count of false discoveries in an experimental procedure.
  • Strategies for FWER like Holm-Bonferroni are very conservative, but by changing the objective from a count to a rate, namely, that the proportion of false discoveries among all discoveries made should be small, Benjamini and Hochberg were able to find a more powerful method.
  • Since the kind of control differs, but is still an interpretable guarantee which may be useful to experimenters, it's important to keep FDR in mind when looking for discoveries in more lenient settings.


STS Theorem 1. Mean-based estimator upper bounds FDR.

  • Suppose we rejected p-values at a constant level tt. For mm hypotheses we'd expect mtmt rejections at most (if they're all null). Then simply dividing this average by RtR_t, the number of rejected hypotheses with p-values less than tt, bounds the actual average FDR E[Vt/Rt]\mathbb{E} [V_t/R_t], where VtV_t is the true number of rejected nulls, by a convexity argument and Jensen's inequality.

BH Procedure

  • Based on the estimator above EFDR^t=E[mt/Rt]E[Vt/Rt]\mathbb{E}\hat{\mathrm{FDR}}_t=\mathbb{E}[mt/R_t]\ge \mathbb{E}[V_t/R_t], one reasonable guess for an adaptive rejection rate that controls FDR at level α\alpha would be t^=sup{tFDR^tα}\hat t = \sup\{t|\hat{\mathrm{FDR}}_{t}\le \alpha\}, since we're controlling an upper bound.
  • By a visual proof (see the notes), rejecting at level t^\hat {t} is equivalent to the usual BH algorithm.

STS Theorem 2. BH actually controls FDR FDR=E[Vt^/Rt^]α\mathrm{FDR}=\mathbb{E}[V_{\hat t}/R_{\hat t}]\le\alpha when p-values are independent.

  • This isn't trivial because Theorem 1 only holds for nonrandom tt.
  • The ratio between true type-I errors and their average Mt=Vt/(mt)M_t=V_t/(mt) is a martingale in reverse time as tt falls from 1 to 0 due to the independence mentioned above. As a result, by Optional Stopping Theorem EMt^=EM11\mathbb{E}M_{\hat t}=\mathbb{E} M_1\le 1.
  • Noticing that we can rewrite the true FDR=E[FDR^t^Mt]\mathrm{FDR}=\mathbb{E}[\hat{\mathrm{FDR}}_{\hat t}M_t] since the first term is uniformly dominated by α\alpha by construction and the second by 1, we have control!

Raw Notes

BH Martingale Notes.pdf


Definition of a stopping time τ\tau is that E[1{τt}Ft]=1{τt}\mathbb{E}[1\{\tau\le t\}|\mathcal{F}_t]=1\{\tau\le t\} not τ\tau

If you like applying these kinds of methods practical ML problems, join our team.

Read more

Quasi-Newton part one: Wolfe conditions | October 13, 2020

The first part in a series of papers we're reading on Quasi-Newton methods, which Sisu relies on for optimizing key subproblems when running workflows.

Read more

Quasi-Newton part two: BFGS | October 27, 2020

A discussion of Quasi-Newton methods that create approximations that meet this criterion and thus enjoy global and local convergence properties.

Read more