Causal Inference

The Synthetic Controls Method | July 6, 2020

By Arnaud Autef - July 6, 2020

In this weeks discussion, we review the Synthetic Controls method, which extends potential outcomes form Causal Inference literature to time-dependent observational data.


Why Synthetic Controls?

  • The Synthetic Control Method extends Potential Outcomes from Causal Inference literature to time-dependent observational data.

    • This matters, because most real-world business and operational data falls in that category!

    • E.g:

      Company A assigns promotional campaing P (aka the "treatment) to subpopulation X (aka the "treated unit") from time tT0t \ge T_0.

      → With synthetic controls, we can answer the question:

      What is the effect that this campaign P had on subpopulation X in terms of Y = average CLTV (customer long term value).

  • Motivation for Sisu?

    • The above example situation is a very likely use case that Sisu could encounter as we grow. For now, Sisu identifies such interesting subpopulation X to target with a promotional campaign P. In the future, we might want to be able to close the loop and soundly estimate the counter factual effect of this promotional campaign.


  • Time-dependent potential outcomes model

    • Time-steps t1, ..., Tt \in 1,~...,~T, distinct units j1, ..., Jj \in 1,~...,~ J, time-dependent metric of interest Yj,tY_{j, t} for each unit. Potential outcomes for unit jj at time tt are written:

      {Yj,tN,Yj,tI}\{Y_{j, t}^N,Y_{j,t}^I \}

      • Where:
        • Yj,tIY_{j, t}^I = observation for a treated unit jj at time tt.
        • Yj,tNY_{j, t}^N = observation for an untreated unit jj at time tt.
    • Before time-step T0T_0, no unit has been treated, and observations follow the factorized time-series model:

      Yj,t=Yj,tN=δt+θtTZj+λtTμj+ϵj,tY_{j, t} = Y_{j,t}^N = \delta_t + \theta_t^TZ_j + \lambda_t^T\mu_j + \epsilon_{j,t}

    • From time T0T_0, unit j=1j=1 is treated and other units are kept untreated:

      KaTeX parse error: $ within math mode
  • Treatment effect definition and estimation

    • Definition: Treatment effect τt\tau_t for treated unit j=1j = 1 from time t=T0+1t = T_0 + 1:

      τt=Y1,tIY1,tN\tau_t = Y_{1,t}^I - Y_{1, t}^N

      • But, by definition, only Y1,tIY_{1, t}^I is observed for tT0+1t \ge T_0 + 1!
    • Estimation strategy - Synthetic Controls Method:

      • High-level idea:

        • For tT0t \le T_0 we observe Y1,tNY_{1, t}^N

          • Fit Y1,tNfθ(Yj>1,tN)Y_{1, t}^N \sim f_\theta(Y_{j>1,t}^N) on tT0t \le T_0, get θ^\hat{\theta}.
        • For t>T0t > T_0 we still observe Yj,tNY_{j,t}^N for untreated units j>1j > 1:

          • Use the estimate Y^1,tNfθ^(Yj>1,tN)\hat{Y}_{1, t}^N \approx f_{\hat{\theta}}(Y_{j>1, t}^N)
        • So that the treatment effect estimate is:

          τ^t:=Y1,tfθ^(Yj>1,t)\hat{\tau}_t := Y_{1, t} - f_{\hat{\theta}}(Y_{j>1,t})

      • In practice:

        • Restrict the class of fitting functions {fθ, θΘ}\{f_{\theta},~\theta \in \Theta\} to convex combinations of untreated units 1<jJ1 <j \le J
          • Θ=ΔJ1\Theta = \Delta^{J -1}
          • X, fθ(X)=j>1JθjXj\forall X,~f_{\theta}(X) = \sum_{j > 1}^{J}\theta_jX_j
  • Main theoretical result:

    • Under assumptions:

      1. (~SUTVA) The treatment of units 11 has no indirect effect on units j>1j > 1.

      2. (Controls approximate well the treated unit) There exists wΔJ1\boldsymbol w^* \in \Delta^{J -1} such that:

        \tag1tT0, Y1,t=Y1,tN=j>1JwjYj,t\tag{1} \forall t \le T_0,~Y_{1, t} = Y_{1, t}^N = \sum_{j > 1}^{J}w^*_j Y_{j, t}

        \tag2Z1=j>1JwjZj\tag{2} Z_1 = \sum_{j > 1}^{J}w^*_j Z_{j}

      3. (~No confounding) Noise terms ϵj,t\epsilon_{j,t} are iidiid with mean 00 and E(ϵj,tZj,μj)=0\mathbb{E}(\epsilon_{j,t}|Z_j,\mu_j) = 0

    • The Synthetic Controls estimator is asymptotically unbiased (for large JJ, T0T_0):

      E(τtτ^t)0\mathbb{E}(|\tau_t - \hat{\tau}_t|) \rightarrow 0

  • Practical considerations

    BIGGEST CAVEAT: convex combinations are a very restrictive class of approximating functions. If the controls do not fit well the treated unit via a convex combination, estimated treatment effects can be heavily polluted by the bias of the fit → requires a good control group.

    • Model estimation

      To fit proper weights to the convex combination ww^*, the authors advise the use of regularization and validation (if enough data).

    • Inference

      How do we estimate the significance of the treatment effects estimated? → In the original paper, the author proposes "Placebo tests": are the treatment effects estimated for treated unit 11 much larger than the treatment effect estimates we would have got by applying Synthetic Controls on unit j>1j > 1?

Raw Notes

If you like applying these kinds of methods practical ML problems, join our team.

Read more

R-Learner | December 7, 2020

A discussion on R-learner, a 2-step causal inference algorithm to estimate heterogeneous treatment effects from observational data.

Read more

Graph Coloring for Machine Learning

Based on our experience working with large, sparse datasets, we describe a method to use graph coloring to reduce the complexity of analysis.

Read more