Statistical Inference

# Model-X knockoffs | April 6, 2021

By John Hallman - April 6, 2021

This week we discuss Model-X Knockoffs by Emmanuel Candes and company, an extension of their previous paper Fixed-X Knockoffs which we covered a few months back.

Materials

Why Model-X Knockoff filters?

• Knockoffs is a family of new and exciting approaches for converting any variable selection procedure into one that controls false discovery rate (FDR).
• The initial Fixed-X (FX) Knockoffs approach only dealt with linear systems and gaussian noise, however.
• Model-X (MX) Knockoffs extends this to arbitrary joint distributions $P(X_1, \ldots, X_p, Y)$ for relatively low amounts of work by modifying the assumptions and requirements of the knockoff variables and variable scoring procedures.
• The above is crucial. FX alone doesn't cover Sisu's use cases since business metrics involve not just scalar values but also categorical values.
• The biggest limitation to this paper is that it doesn't cover how to generate good MX knockoffs. It merely states the requirements for MX to provide FDR control.
• That said, future papers address the limitations above. Stay tuned for more in future reading groups...

Nuggets

• There are 3 components to MX Knockoffs, as with FX, although with slightly different assumptions and results: (1) the knockoff generation procedure, (2) the feature scoring procedure, and (3) the threshold selection procedure.
• (1) Knockoff generation — for MX Knockoffs to work, the knockoff variables must be generated such that the exchangeability property and independence holds w.r.t the original and the knockoff variables:

$P([X, \tilde{X}]) \overset{d}{=} P([X, \tilde{X}]_{swap(S)})$

$\tilde{X} \perp y \: | \: X$

• Where the $swap(S)$ refers to switching columns between the original and the knockoff variables for each index $j \in S$. Note that the paper says little about how to generate $\tilde{X}$ that satisfies the above, which is easier said than done.

• (2) Feature scoring — for MX Knockoffs to work, the feature scoring procedures $w_j : (X, \tilde{X}, y) \rightarrow \mathbb{R}$ for each variable $j$ must be anti-symmetric, which means that $w_j([X, \tilde{X}]_{swap(S)}, y) = \pm w_j([X, \tilde{X}], y)$, with + if $j \notin S$ and - if $j \in S$.

• The paper finds that lasso difference (and most other lasso-based scoring mechanisms) works quite well here.

• (3) If (1) and (2) holds, then we can select variables while controlling FDR at the $q$-level if we pick all variables for which their feature scores $W_j = w_j([X, \tilde{X}], y)$ are above the threshold

$\tau = min\{ t > 0 : \frac{1 + |\{ j : W_j \leq -t \}|}{|\{ j : W_j \geq t\}|} \leq q \}$

• Intuitively, this controls FDR because we can use the number of variables with negative feature scores as an approximation for the number of false discoveries, since this corresponds to a knockoff variable being considered "more significant" than the original variable.

• Tangentially to MX knockoffs, this paper also briefly mentions Conditional Randomization Testing (CRT), which can be considered a MX knockoffs lite.
• Briefly, given $n$ samples $(X_{1i}, \ldots X_{ip}, Y_i)$, and a feature statistic $T_j(X, y)$ for some index $j \in [p]$, consider the following procedure for accepting/rejecting variable $j$:

• For $k = 1, \ldots, K$, create a new matrix $X^{(k)}$ by replacing the $j$th column by replacing each $X_{ij}$ with a new value $X_{ij}^{(k)}$ sampled from the distribution $P(X_{ij} | X_{i,-j})$.

• Then, compute the (one-sided) p-value

$P_j = \frac{1}{K+1}\left[ 1 + \sum_{k=1}^K \mathbf{1}\{T_j(X^{(k)}, y)\} \geq T_j(X, y)\} \right]$

• Informally, we measure the regularized proportion of times the original matrix $X$ scores lower w.r.t. $T_j$ than the "knockoff" matrices $X^{(k)}$.

• This procedure is interesting because it's a much more straightforward approach to knockoff variable selection, although it doesn't deal with variable correlation and generally suffers from performance issues.

Raw Notes

If you like applying these kinds of methods practical ML problems, join our team.