Statistical Inference

By John Hallman - April 6, 2021

This week we discuss Model-X Knockoffs by Emmanuel Candes and company, an extension of their previous paper Fixed-X Knockoffs which we covered a few months back.

**Materials**

- Model-X Knockoffs
- Original Fixed-X Knockoffs (for context)

**Why Model-X Knockoff filters?**

- Knockoffs is a family of new and exciting approaches for converting
*any*variable selection procedure into one that controls false discovery rate (FDR). - The initial Fixed-X (FX) Knockoffs approach only dealt with linear systems and gaussian noise, however.
- Model-X (MX) Knockoffs extends this to arbitrary joint distributions $P(X_1, \ldots, X_p, Y)$ for relatively low amounts of work by modifying the assumptions and requirements of the knockoff variables and variable scoring procedures.
- The above is crucial. FX alone doesn't cover Sisu's use cases since business metrics involve not just scalar values but also categorical values.
- The biggest limitation to this paper is that it doesn't cover
*how*to generate good MX knockoffs. It merely states the requirements for MX to provide FDR control. - That said, future papers address the limitations above. Stay tuned for more in future reading groups...

**Nuggets**

- There are 3 components to MX Knockoffs, as with FX, although with slightly different assumptions and results: (1) the knockoff generation procedure, (2) the feature scoring procedure, and (3) the threshold selection procedure.
(1) Knockoff generation — for MX Knockoffs to work, the knockoff variables must be generated such that the

*exchangeability property*and*independence*holds w.r.t the original and the knockoff variables:$P([X, \tilde{X}]) \overset{d}{=} P([X, \tilde{X}]_{swap(S)})$

$\tilde{X} \perp y \: | \: X$

Where the $swap(S)$ refers to switching columns between the original and the knockoff variables for each index $j \in S$. Note that the paper says little about how to generate $\tilde{X}$ that satisfies the above, which is easier said than done.

(2) Feature scoring — for MX Knockoffs to work, the feature scoring procedures $w_j : (X, \tilde{X}, y) \rightarrow \mathbb{R}$ for each variable $j$ must be

*anti-symmetric*, which means that $w_j([X, \tilde{X}]_{swap(S)}, y) = \pm w_j([X, \tilde{X}], y)$, with + if $j \notin S$ and - if $j \in S$.The paper finds that lasso difference (and most other lasso-based scoring mechanisms) works quite well here.

(3) If (1) and (2) holds, then we can select variables while controlling FDR at the $q$-level if we pick all variables for which their feature scores $W_j = w_j([X, \tilde{X}], y)$ are above the threshold

$\tau = min\{ t > 0 : \frac{1 + |\{ j : W_j \leq -t \}|}{|\{ j : W_j \geq t\}|} \leq q \}$

Intuitively, this controls FDR because we can use the number of variables with negative feature scores as an approximation for the number of false discoveries, since this corresponds to a knockoff variable being considered "more significant" than the original variable.

- Tangentially to MX knockoffs, this paper also briefly mentions Conditional Randomization Testing (CRT), which can be considered a MX knockoffs lite.
Briefly, given $n$ samples $(X_{1i}, \ldots X_{ip}, Y_i)$, and a feature statistic $T_j(X, y)$ for some index $j \in [p]$, consider the following procedure for accepting/rejecting variable $j$:

For $k = 1, \ldots, K$, create a new matrix $X^{(k)}$ by replacing the $j$th column by replacing each $X_{ij}$ with a new value $X_{ij}^{(k)}$ sampled from the distribution $P(X_{ij} | X_{i,-j})$.

Then, compute the (one-sided) p-value

$P_j = \frac{1}{K+1}\left[ 1 + \sum_{k=1}^K \mathbf{1}\{T_j(X^{(k)}, y)\} \geq T_j(X, y)\} \right]$

Informally, we measure the regularized proportion of times the original matrix $X$ scores lower w.r.t. $T_j$ than the "knockoff" matrices $X^{(k)}$.

This procedure is interesting because it's a much more straightforward approach to knockoff variable selection, although it doesn't deal with variable correlation and generally suffers from performance issues.

**Raw Notes**

*If you like applying these kinds of methods practical ML problems, join our team.*