This week, we discussed Prophet, an automatic, open-source, time-series forecasting tool by Facebook.
- Time-series forecasting is highly relevant to anomaly detection, which Sisu is interested in doing.
- Prophet focuses on business time-series similar to ones we face at Sisu.
- Prophet is highly automatic (requires little tuning by analysts/users), which is something we want in our products too.
- Model time series as , with being trend, seasonality, holidays/outlier days, and Gaussian noise.
- Model as a piecewise-linear function, with a sparse (Laplace) prior on a large set of initial possible breakpoints.
- Model as a sum of Fourier series, with each coefficient having a Gaussian prior.
- Model by manually selecting a set of business-relevant holidays/special days with Gaussian priors.
- Assume a prior on each component model and train using maximum a posteriori estimation (MAP). This allows us to directly obtain a posteriori estimates of the probability of new observations.
- New evaluation procedure for time series models:
- Choose a business-problem appropriate loss .
- Specify a set of time horizons of interest, for example days.
- Define sub-horizon loss for each .
- Loop over time series dataset and make a length forecast every time steps.
- With these forecasts, compute sub-horizon losses .
- In general, we expect to be smooth and nondecreasing. If this is not the case, consider using isotonic regression or similar to smooth out the results.
- General advice for how to evaluate time-series models:
- Compare to both other advanced models (eg. ETS, TBATS), as well as robust baselines (last value, sample mean).
- Validate over varying time horizons .
- Look at both the graph of forecasted time-series as well as the error curves obtained from the evaluation procedure described above.
- In general, time-series models will always do poorly in some situations. It is important to set up a system for detecting when these scenarios occur. Advice on how to do this:
- Look at error relative to robust baseline models (if similar or greater then something is off).
- If all models perform really badly on certain data points, likely outliers.
- If error across horizons sharply increase at a certain cutoff, there may have been a change in the data generation process, or the model may be miscalibrated.
If you like applying these kinds of methods practical ML problems, join our team.