Causal Inference

# The Synthetic Controls Method | July 6, 2020

By Arnaud Autef - July 6, 2020

In this weeks discussion, we review the Synthetic Controls method, which extends potential outcomes form Causal Inference literature to time-dependent observational data.

Materials

Why Synthetic Controls?

• The Synthetic Control Method extends Potential Outcomes from Causal Inference literature to time-dependent observational data.

• This matters, because most real-world business and operational data falls in that category!

• E.g:

Company A assigns promotional campaing P (aka the "treatment) to subpopulation X (aka the "treated unit") from time $t \ge T_0$.

→ With synthetic controls, we can answer the question:

What is the effect that this campaign P had on subpopulation X in terms of Y = average CLTV (customer long term value).

• Motivation for Sisu?

• The above example situation is a very likely use case that Sisu could encounter as we grow. For now, Sisu identifies such interesting subpopulation X to target with a promotional campaign P. In the future, we might want to be able to close the loop and soundly estimate the counter factual effect of this promotional campaign.

Nuggets

• Time-dependent potential outcomes model

• Time-steps $t \in 1,~...,~T$, distinct units $j \in 1,~...,~ J$, time-dependent metric of interest $Y_{j, t}$ for each unit. Potential outcomes for unit $j$ at time $t$ are written:

$\{Y_{j, t}^N,Y_{j,t}^I \}$

• Where:
• $Y_{j, t}^I$ = observation for a treated unit $j$ at time $t$.
• $Y_{j, t}^N$ = observation for an untreated unit $j$ at time $t$.
• Before time-step $T_0$, no unit has been treated, and observations follow the factorized time-series model:

$Y_{j, t} = Y_{j,t}^N = \delta_t + \theta_t^TZ_j + \lambda_t^T\mu_j + \epsilon_{j,t}$

• From time $T_0$, unit $j=1$ is treated and other units are kept untreated:

KaTeX parse error: \$ within math mode
• Treatment effect definition and estimation

• Definition: Treatment effect $\tau_t$ for treated unit $j = 1$ from time $t = T_0 + 1$:

$\tau_t = Y_{1,t}^I - Y_{1, t}^N$

• But, by definition, only $Y_{1, t}^I$ is observed for $t \ge T_0 + 1$!
• Estimation strategy - Synthetic Controls Method:

• High-level idea:

• For $t \le T_0$ we observe $Y_{1, t}^N$

• Fit $Y_{1, t}^N \sim f_\theta(Y_{j>1,t}^N)$ on $t \le T_0$, get $\hat{\theta}$.
• For $t > T_0$ we still observe $Y_{j,t}^N$ for untreated units $j > 1$:

• Use the estimate $\hat{Y}_{1, t}^N \approx f_{\hat{\theta}}(Y_{j>1, t}^N)$
• So that the treatment effect estimate is:

$\hat{\tau}_t := Y_{1, t} - f_{\hat{\theta}}(Y_{j>1,t})$

• In practice:

• Restrict the class of fitting functions $\{f_{\theta},~\theta \in \Theta\}$ to convex combinations of untreated units $1
• $\Theta = \Delta^{J -1}$
• $\forall X,~f_{\theta}(X) = \sum_{j > 1}^{J}\theta_jX_j$
• Main theoretical result:

• Under assumptions:

1. (~SUTVA) The treatment of units $1$ has no indirect effect on units $j > 1$.

2. (Controls approximate well the treated unit) There exists $\boldsymbol w^* \in \Delta^{J -1}$ such that:

$\tag{1} \forall t \le T_0,~Y_{1, t} = Y_{1, t}^N = \sum_{j > 1}^{J}w^*_j Y_{j, t}$

$\tag{2} Z_1 = \sum_{j > 1}^{J}w^*_j Z_{j}$

3. (~No confounding) Noise terms $\epsilon_{j,t}$ are $iid$ with mean $0$ and $\mathbb{E}(\epsilon_{j,t}|Z_j,\mu_j) = 0$

• The Synthetic Controls estimator is asymptotically unbiased (for large $J$, $T_0$):

$\mathbb{E}(|\tau_t - \hat{\tau}_t|) \rightarrow 0$

• Practical considerations

BIGGEST CAVEAT: convex combinations are a very restrictive class of approximating functions. If the controls do not fit well the treated unit via a convex combination, estimated treatment effects can be heavily polluted by the bias of the fit → requires a good control group.

• Model estimation

To fit proper weights to the convex combination $w^*$, the authors advise the use of regularization and validation (if enough data).

• Inference

How do we estimate the significance of the treatment effects estimated? → In the original paper, the author proposes "Placebo tests": are the treatment effects estimated for treated unit $1$ much larger than the treatment effect estimates we would have got by applying Synthetic Controls on unit $j > 1$?

Raw Notes

If you like applying these kinds of methods practical ML problems, join our team.