By Arnaud Autef - July 6, 2020
In this weeks discussion, we review the Synthetic Controls method, which extends potential outcomes form Causal Inference literature to time-dependent observational data.
Why Synthetic Controls?
The Synthetic Control Method extends Potential Outcomes from Causal Inference literature to time-dependent observational data.
This matters, because most real-world business and operational data falls in that category!
Company A assigns promotional campaing P (aka the "treatment) to subpopulation X (aka the "treated unit") from time .
→ With synthetic controls, we can answer the question:
What is the effect that this campaign P had on subpopulation X in terms of Y = average CLTV (customer long term value).
Motivation for Sisu?
Time-dependent potential outcomes model
Time-steps , distinct units , time-dependent metric of interest for each unit. Potential outcomes for unit at time are written:
Before time-step , no unit has been treated, and observations follow the factorized time-series model:
From time , unit is treated and other units are kept untreated:
Treatment effect definition and estimation
Definition: Treatment effect for treated unit from time :
Estimation strategy - Synthetic Controls Method:
For we observe
For we still observe for untreated units :
So that the treatment effect estimate is:
Main theoretical result:
(~SUTVA) The treatment of units has no indirect effect on units .
(Controls approximate well the treated unit) There exists such that:
(~No confounding) Noise terms are with mean and
The Synthetic Controls estimator is asymptotically unbiased (for large , ):
BIGGEST CAVEAT: convex combinations are a very restrictive class of approximating functions. If the controls do not fit well the treated unit via a convex combination, estimated treatment effects can be heavily polluted by the bias of the fit → requires a good control group.
To fit proper weights to the convex combination , the authors advise the use of regularization and validation (if enough data).
How do we estimate the significance of the treatment effects estimated? → In the original paper, the author proposes "Placebo tests": are the treatment effects estimated for treated unit much larger than the treatment effect estimates we would have got by applying Synthetic Controls on unit ?
If you like applying these kinds of methods practical ML problems, join our team.