Statistical Inference

Model-X knockoffs | April 6, 2021

By John Hallman - April 6, 2021

This week we discuss Model-X Knockoffs by Emmanuel Candes and company, an extension of their previous paper Fixed-X Knockoffs which we covered a few months back.

Materials

Why Model-X Knockoff filters?

  • Knockoffs is a family of new and exciting approaches for converting any variable selection procedure into one that controls false discovery rate (FDR).
  • The initial Fixed-X (FX) Knockoffs approach only dealt with linear systems and gaussian noise, however.
  • Model-X (MX) Knockoffs extends this to arbitrary joint distributions P(X1,,Xp,Y)P(X_1, \ldots, X_p, Y) for relatively low amounts of work by modifying the assumptions and requirements of the knockoff variables and variable scoring procedures.
  • The above is crucial. FX alone doesn't cover Sisu's use cases since business metrics involve not just scalar values but also categorical values.
  • The biggest limitation to this paper is that it doesn't cover how to generate good MX knockoffs. It merely states the requirements for MX to provide FDR control.
  • That said, future papers address the limitations above. Stay tuned for more in future reading groups...

Nuggets

  • There are 3 components to MX Knockoffs, as with FX, although with slightly different assumptions and results: (1) the knockoff generation procedure, (2) the feature scoring procedure, and (3) the threshold selection procedure.
    • (1) Knockoff generation — for MX Knockoffs to work, the knockoff variables must be generated such that the exchangeability property and independence holds w.r.t the original and the knockoff variables:

      P([X,X~])=dP([X,X~]swap(S))P([X, \tilde{X}]) \overset{d}{=} P([X, \tilde{X}]_{swap(S)})

      X~yX\tilde{X} \perp y \: | \: X

    • Where the swap(S)swap(S) refers to switching columns between the original and the knockoff variables for each index jSj \in S. Note that the paper says little about how to generate X~\tilde{X} that satisfies the above, which is easier said than done.

    • (2) Feature scoring — for MX Knockoffs to work, the feature scoring procedures wj:(X,X~,y)Rw_j : (X, \tilde{X}, y) \rightarrow \mathbb{R}