ML Systems

Bayesian Online Changepoint Detection | July 27, 2021

By John Hallman - July 27, 2021

Materials

Why BOCD?

  • Changepoint detection is a problem setting that focuses on detecting distributional changes in time-series data, such as trend changes or mean and volatility increases.
  • BOCD is a changepoint detection algorithm that focuses on the online setting, where we try to model trend changes as data comes in, rather than post hoc when the entire time series can be observed.
  • By combining Bayesian probability and exponential family distributions with convenient conjugate priors, BOCD is able to model a diverse set of time series distributions while being far more efficient than a lot of alternative time series models that use computationally expensive offline training.

Nuggets

  • The algorithm takes as input:
    • (1) a Hazard function which determines how frequent changepoints are.
    • (2) an exponential family and default prior parameters.
  • The key idea behind BOCD is the following - at time tt, we track the probability that a changepoint occurred kk time points earlier for all values of kk from 00 to tt, along with a set of exponential family prior parameters for each such value kk, conditional on a changepoint having occurred at time tkt - k.
  • If we are given the time of the last changepoint, then it is easy to compute the posterior parameters since we are using conjugate exponential family distributions. Therefore, the key difficulty is modeling the probability of the most recent changepoint from time 0 to tt.
  • Conveniently, this can be done in an online manner! If we have the distribution up to time t1t-1 and observe the next datapoint xtx_t, the distribution of the most recent changepoint at time tt is a simple function of the hazard, the current prior parameters, and the distribution at time t1t-1.
  • Finally, when using BOCD in practice, we may be interested in clearly defined timestamps where the changepoints occur, for example if we want to alert when there is a change. The most straightforward approach, which was also used in the paper, is to take the maximum a posteriori probability of the changepoint distribution.

Notes

If you like applying these kinds of methods to practical ML problems, join our team.


Read more

Prophet | February 9, 2021

In this conversation, we take a look at Prophet, an automatic, open-source, time-series forecasting tool by Facebook.

Read more

Prophet | February 9, 2021

In this conversation, we take a look at Prophet, an automatic, open-source, time-series forecasting tool by Facebook.

Read more