Statistical Inference

Fixed-X knockoffs | November 9, 2020

By John Hallman - November 9, 2020

This week we cover a novel method for False Discovery Rate control in variable selection — the Fixed-X Knockoff filter by Rina Barber and Emmanuel Candès.


Why knockoff filters?

  • Knockoff filters are a new collection of procedures that control FDR, and what's really cool is that they generally work under a very different model than traditional procedures. In particular, the Model-X Knockoff filter that build on fixed-X knockoffs work for a much broader range of distributions and do not even require the calculation of p-values! In exchange, they require greater knowledge about the generating distribution of the covariates.
  • Knockoff filters was a candidate for our FDR control procedure earlier this year, and it is still possible that we may find a use case for this in the future.


  • Theorem 3: The following Knockoff+ procedure given in the paper above controls FDR at a given level qq:
    • For features X=[X1,,Xp]X = [X_1, \ldots, X_p], where XX=ΣX^{\top}X = \Sigma and XX normalized so Σjj=1,j\Sigma_{jj} = 1, \forall j, construct knockoffs X~j\tilde{X}_j for each XjX_j such that:
      • X~X~=Σ\tilde{X}^{\top} \tilde{X} = \Sigma
      • XX~=Σdiag(s)X^{\top} \tilde{X} = \Sigma - \text{diag}(s) for some vector ss.
      • Intuition: we want to construct knockoff features that mimic the correlation structure of the original variables, while removing their effect on the response yy.
    • Compute test statistics W=(W1,,Wp)W = (W_1, \ldots, W_p) for each (feature, knockoff) pair with the following properties:
      • Sufficiency: WW only depends on X,yX, y via the Gram matrix and feature response, i.e.f\exists f such that W=f([X,X~][X,X~],[X,X~]y)W = f([X, \tilde{X}]^{\top} [X, \tilde{X}], [X, \tilde{X}]^{\top}y)
      • Anti-symmetry: Wj([X,X~]swap(S),y)=±Wj([X,X~],y)W_j([X, \tilde{X}]_{swap(S)}, y) = \pm W_j([X, \tilde{X}], y) with + if jSj \notin S and - if jSj \in S, where [X,X~]swap(S)[X, \tilde{X}]_{swap(S)} denotes for any subset S[p]S \subseteq [p] the swapping of each column/feature XjX_j with its knockoff X~j\tilde{X}_j for all jSj \in S.
      • Intuition: the above properties ensure that for null features, our test statistics cannot distinguish the original features from their knockoffs, and in particular the sign of the statistic is ±1\pm1 with equal probability.
    • Compute threshold T=min{tW:1+{j:Wjt}{j:Wjt}1q}T = \min \left\{ t \in W : \frac{1 + |\{j : W_j \leq -t\}|}{|\{j : W_j \geq t\}| \lor 1 } \leq q \right\} and select all features for which WjTW_j \geq T.
      • Intuition: by selecting variables while controlling for the number of negative statistics, we control the number of null variables due to the intuition explained above.

Raw Notes


If you like applying these kinds of methods practical ML problems, join our team.

Read more

SCANN | August 3, 2020

This week, we take a look at ScaNN (Scalable Nearest Neighbors), a method for efficient vector similarity search at scale.

Read more

Quasi-Newton part one: Wolfe conditions | October 13, 2020

The first part in a series of papers we're reading on Quasi-Newton methods, which Sisu relies on for optimizing key subproblems when running workflows.

Read more