Statistical Inference

By Barak Oshri - January 25, 2021

This week we discussed the error control theorem in Stability Selection, a finite sample control technique for structure estimation.

**Materials**

- Paper by Nicolai Meinshausen & Peter Buehlmann

**Why stability selection?**

- The estimation of model structure in high dimensional settings is an extremely common statistical problem, found for example in supervised learning, gaussian graphical modelling, and clustering. Each of these settings have their own structure estimation techniques.
- Stability selection addresses the problem of proper regularization in any structure estimation setting using a generic subsampling approach that conservatively controls the type I error rate in finite samples.
- False discovery control with such generality is achieved mainly by assuming the exchangeability assumption and a "better than random guessing" condition.

**Nuggets**

The main result in the paper is an error control bound for the expected number of false discoveries when using stability selection.

- Assume that stability selection is no worse than random guessing, which establishes a simple bound on the expected number of false positives.
- By assuming exchangeability, the probability of selecting any single false positive is distributed evenly among all the false positives.
- Using Markov's inequality, we can establish an upper bound on the probability of selecting a false positive in the simultaneous selection setting when the probability of selecting it in an individual trial is low (Lemma 2).
- Converting simultaneous selection probabilities into individual trial probabilities (Lemma 1) pays in the denominator of the final bound.

Lemma 1 (Lower bound for simultaneous selection probabilities)

- The stability selection formalization is written in terms of subsample probabilities but the math in the error control proof relies on simultaneous selection probabilities, so this lemma is used to establish a lower bound.
- The probability that some set of variables is in the selected set from a single-split subsample is equivalent to the probability that it is in both selected sets of simultaneous splits or in one of them.
- Using some simple inequalities that the probability of the variables not being selected in both splits is non-negative and that the probability of being selected in one simultaneous sample and not the other is smaller than not being selected in a single subsample, we get the result.

Lemma 2 (Markov-like inequality for lower bounding the probability of selecting false positives)

- This is the main mechanic to establish stability selection's error guarantees, relating the the choice of selection threshold to the probability of selectin false positives.
- In this proof, $K$ is a stand-in for false positives. By assuming that the probability of selecting $K$ is less than some small $\epsilon$, we establish a lower bound on the simultaneous selection probability as $\epsilon^2$.
- Apply Markov's inequality with the upper bound being some $\xi$, the selection threshold, to the inequality to obtain the desired result.

**Raw notes:**

*If you like applying these kinds of methods practical ML problems, join our team.*