Statistical Inference

By John Hallman - November 9, 2020

This week we cover a novel method for False Discovery Rate control in variable selection — the Fixed-X Knockoff filter by Rina Barber and Emmanuel Candès.

**Materials**

- Controlling the false discovery rate via knockoffs - original paper
- Supplement to above - proofs for some of the theorems

**Why knockoff filters?**

- Knockoff filters are a new collection of procedures that control FDR, and what's really cool is that they generally work under a very different model than traditional procedures. In particular, the Model-X Knockoff filter that build on fixed-X knockoffs work for a much broader range of distributions and do not even require the calculation of p-values! In exchange, they require greater knowledge about the generating distribution of the covariates.
- Knockoff filters was a candidate for our FDR control procedure earlier this year, and it is still possible that we may find a use case for this in the future.

**Nuggets**

- Theorem 3: The following Knockoff+ procedure given in the paper above controls FDR at a given level $q$:
- For features $X = [X_1, \ldots, X_p]$, where $X^{\top}X = \Sigma$ and $X$ normalized so $\Sigma_{jj} = 1, \forall j$, construct knockoffs $\tilde{X}_j$ for each $X_j$ such that:
- $\tilde{X}^{\top} \tilde{X} = \Sigma$
- $X^{\top} \tilde{X} = \Sigma - \text{diag}(s)$ for some vector $s$.
- Intuition: we want to construct knockoff features that mimic the correlation structure of the original variables, while removing their effect on the response $y$.

- Compute test statistics $W = (W_1, \ldots, W_p)$ for each (feature, knockoff) pair with the following properties:
*Sufficiency:*$W$ only depends on $X, y$ via the Gram matrix and feature response, i.e.$\exists f$ such that $W = f([X, \tilde{X}]^{\top} [X, \tilde{X}], [X, \tilde{X}]^{\top}y)$*Anti-symmetry:*$W_j([X, \tilde{X}]_{swap(S)}, y) = \pm W_j([X, \tilde{X}], y)$ with + if $j \notin S$ and - if $j \in S$, where $[X, \tilde{X}]_{swap(S)}$ denotes for any subset $S \subseteq [p]$ the swapping of each column/feature $X_j$ with its knockoff $\tilde{X}_j$ for all $j \in S$.- Intuition: the above properties ensure that for null features, our test statistics cannot distinguish the original features from their knockoffs, and in particular the sign of the statistic is $\pm1$ with equal probability.

- Compute threshold $T = \min \left\{ t \in W : \frac{1 + |\{j : W_j \leq -t\}|}{|\{j : W_j \geq t\}| \lor 1 } \leq q \right\}$ and select all features for which $W_j \geq T$.
- Intuition: by selecting variables while controlling for the number of negative statistics, we control the number of null variables due to the intuition explained above.

- For features $X = [X_1, \ldots, X_p]$, where $X^{\top}X = \Sigma$ and $X$ normalized so $\Sigma_{jj} = 1, \forall j$, construct knockoffs $\tilde{X}_j$ for each $X_j$ such that:

**Raw Notes**

*If you like applying these kinds of methods practical ML problems, join our team.*