Statistical Inference

Model-X knockoffs | April 6, 2021

By John Hallman - April 6, 2021

This week we discuss Model-X Knockoffs by Emmanuel Candes and company, an extension of their previous paper Fixed-X Knockoffs which we covered a few months back.

Materials

Why Model-X Knockoff filters?

• Knockoffs is a family of new and exciting approaches for converting any variable selection procedure into one that controls false discovery rate (FDR).
• The initial Fixed-X (FX) Knockoffs approach only dealt with linear systems and gaussian noise, however.
• Model-X (MX) Knockoffs extends this to arbitrary joint distributions $P(X_1, \ldots, X_p, Y)$ for relatively low amounts of work by modifying the assumptions and requirements of the knockoff variables and variable scoring procedures.
• The above is crucial. FX alone doesn't cover Sisu's use cases since business metrics involve not just scalar values but also categorical values.
• The biggest limitation to this paper is that it doesn't cover how to generate good MX knockoffs. It merely states the requirements for MX to provide FDR control.
• That said, future papers address the limitations above. Stay tuned for more in future reading groups...

Nuggets

• There are 3 components to MX Knockoffs, as with FX, although with slightly different assumptions and results: (1) the knockoff generation procedure, (2) the feature scoring procedure, and (3) the threshold selection procedure.
• (1) Knockoff generation — for MX Knockoffs to work, the knockoff variables must be generated such that the exchangeability property and independence holds w.r.t the original and the knockoff variables:

$P([X, \tilde{X}]) \overset{d}{=} P([X, \tilde{X}]_{swap(S)})$

$\tilde{X} \perp y \: | \: X$

• Where the $swap(S)$ refers to switching columns between the original and the knockoff variables for each index $j \in S$. Note that the paper says little about how to generate $\tilde{X}$ that satisfies the above, which is easier said than done.

• (2) Feature scoring — for MX Knockoffs to work, the feature scoring procedures $w_j : (X, \tilde{X}, y) \rightarrow \mathbb{R}$