Statistical Inference

Model-X knockoffs | April 6, 2021

By John Hallman - April 6, 2021

This week we discuss Model-X Knockoffs by Emmanuel Candes and company, an extension of their previous paper Fixed-X Knockoffs which we covered a few months back.


Why Model-X Knockoff filters?

  • Knockoffs is a family of new and exciting approaches for converting any variable selection procedure into one that controls false discovery rate (FDR).
  • The initial Fixed-X (FX) Knockoffs approach only dealt with linear systems and gaussian noise, however.
  • Model-X (MX) Knockoffs extends this to arbitrary joint distributions P(X1,,Xp,Y)P(X_1, \ldots, X_p, Y) for relatively low amounts of work by modifying the assumptions and requirements of the knockoff variables and variable scoring procedures.
  • The above is crucial. FX alone doesn't cover Sisu's use cases since business metrics involve not just scalar values but also categorical values.
  • The biggest limitation to this paper is that it doesn't cover how to generate good MX knockoffs. It merely states the requirements for MX to provide FDR control.
  • That said, future papers address the limitations above. Stay tuned for more in future reading groups...


  • There are 3 components to MX Knockoffs, as with FX, although with slightly different assumptions and results: (1) the knockoff generation procedure, (2) the feature scoring procedure, and (3) the threshold selection procedure.
    • (1) Knockoff generation — for MX Knockoffs to work, the knockoff variables must be generated such that the exchangeability property and independence holds w.r.t the original and the knockoff variables:

      P([X,X~])=dP([X,X~]swap(S))P([X, \tilde{X}]) \overset{d}{=} P([X, \tilde{X}]_{swap(S)})

      X~yX\tilde{X} \perp y \: | \: X

    • Where the swap(S)swap(S) refers to switching columns between the original and the knockoff variables for each index jSj \in S. Note that the paper says little about how to generate X~\tilde{X} that satisfies the above, which is easier said than done.

    • (2) Feature scoring — for MX Knockoffs to work, the feature scoring procedures wj:(X,X~,y)Rw_j : (X, \tilde{X}, y) \rightarrow \mathbb{R} for each variable jj must be anti-symmetric, which means that wj([X,X~]swap(S),y)=±wj([X,X~],y)w_j([X, \tilde{X}]_{swap(S)}, y) = \pm w_j([X, \tilde{X}], y), with + if jSj \notin S and - if jSj \in S.

    • The paper finds that lasso difference (and most other lasso-based scoring mechanisms) works quite well here.

    • (3) If (1) and (2) holds, then we can select variables while controlling FDR at the qq-level if we pick all variables for which their feature scores Wj=wj([X,X~],y)W_j = w_j([X, \tilde{X}], y) are above the threshold

      τ=min{t>0:1+{j:Wjt}{j:Wjt}q}\tau = min\{ t > 0 : \frac{1 + |\{ j : W_j \leq -t \}|}{|\{ j : W_j \geq t\}|} \leq q \}

    • Intuitively, this controls FDR because we can use the number of variables with negative feature scores as an approximation for the number of false discoveries, since this corresponds to a knockoff variable being considered "more significant" than the original variable.

  • Tangentially to MX knockoffs, this paper also briefly mentions Conditional Randomization Testing (CRT), which can be considered a MX knockoffs lite.
    • Briefly, given nn samples (X1i,Xip,Yi)(X_{1i}, \ldots X_{ip}, Y_i), and a feature statistic Tj(X,y)T_j(X, y) for some index j[p]j \in [p], consider the following procedure for accepting/rejecting variable jj:

    • For k=1,,Kk = 1, \ldots, K, create a new matrix X(k)X^{(k)} by replacing the jjth column by replacing each XijX_{ij} with a new value Xij(k)X_{ij}^{(k)} sampled from the distribution P(XijXi,j)P(X_{ij} | X_{i,-j}).

    • Then, compute the (one-sided) p-value

      Pj=1K+1[1+k=1K1{Tj(X(k),y)}Tj(X,y)}]P_j = \frac{1}{K+1}\left[ 1 + \sum_{k=1}^K \mathbf{1}\{T_j(X^{(k)}, y)\} \geq T_j(X, y)\} \right]

    • Informally, we measure the regularized proportion of times the original matrix XX scores lower w.r.t. TjT_j than the "knockoff" matrices X(k)X^{(k)}.

    • This procedure is interesting because it's a much more straightforward approach to knockoff variable selection, although it doesn't deal with variable correlation and generally suffers from performance issues.

Raw Notes

If you like applying these kinds of methods practical ML problems, join our team.

Read more

Fixed-X knockoffs | November 9, 2020

A look at a novel method for False Discovery Rate control in variable selection — the Fixed-X Knockoff filter by Rina Barber and Emmanuel Candès.

Read more

Quasi-Newton part one: Wolfe conditions | October 13, 2020

The first part in a series of papers we're reading on Quasi-Newton methods, which Sisu relies on for optimizing key subproblems when running workflows.

Read more