Optimization

# BlitzWS | May 18, 2021

By Nivedita Bhaskhar - May 18, 2021

In our discussion this week, we looked at a "fast and principled working set algorithm" called BlitzWS proposed by Johnson and Guestrin.

Materials

Paper link, Johnson and Guestrin, 2018

Why BlitzWS?

BlitzWS is a working set algorithm designed for optimizing a piecewise objective function, where each piece and also the objective function is assumed to have nice convexity properties. BlitzWS is an iterative algorithm, where during each iteration, it optimizes a relaxed objective function which is generally a simpler sub-problem to solve. By solving a sequence of such subproblems, BlitzWS converges to the original problem’s solution.

The highlight of the paper is that BlitzWS carefully constructs the relaxed objective function at each step so as to guarantee a specified amount of progress in each iteration. Further BlitzWS can be applied to a wide range of practical problems including SVM, Lasso et al.

Nuggets

• Optimization problem and shape of the objective function

Find the minimizer of the function $f(x) = \psi(x) + \sum_{i=1}^{m} \phi_i(x)$ where

• All functions are from $\mathbb{R}^n\to \mathbb{R}$
• Each $\phi_i$ is a piecewise function, i.e. there is a partition $\pi_i : \mathbb{R}^n \to \{1,2,\ldots, p_i\}$ so that on the j-th partition $\pi_i^{-1}(j)$, $\phi_i = \phi_i^{(j)}$
• Convexity and continuity assumptions about $f$
• Assume $\psi$, $\phi_i$, $\phi_i^{(j)}$ are all convex and lower semicontinuous

• Assume $\psi$ is 1-strongly-convex, i.e.

$\psi(y) \geq \psi(x) + \langle y-x, \nabla f(x) \rangle + \frac{1}{2}||y-x||^2$

• How to measure progress

In the $t$-th iteration

• $f_t$ is the relaxed objective function that is minimized, which will be arranged so that $f_t\leq f$
• BlitzWS computes two iterates
• $x_t$ which is the argmin of $f_t$
• A second iterate $y_t$
• Sub-optimality gap for the $t$-th iteration is defined to be $\Delta_t := f(y_t)-f_t(x_t)$
• Let $x^*$ be the minimizer of $f$. Then note that $f_t(x_t)\leq f(x^*) \leq f(y_t)$ and so $\Delta_t \geq 0$
• $\Delta_t = 0 \implies f(y_t) = f(x^*)$ and hence $\Delta_t$ decrease is progress!
• BlitzWS algorithm

In the $t$-th iteration:

• Select progress parameter $\xi_t\in (0,1]$ (Note that a larger $\xi_t$ will imply a guarantee of more progress - see theorem below)
• Determine relaxed objective function $f_t$ which will be of the shape $f_t = \psi + \sum_{i=1}^{m} \phi_{i,t}$ where $\phi_{i,t}$ will either be $\phi_i$ or one of the pieces $\phi_i^{(j)}$
• Compute $x_t$, the minimizer of $f_t$ (i.e. solve subproblem $t$)
• Compute second iterate $y_t$ to be minimizer of $f(x)$ subject to the constraint that it lies on the line segment joining $y_{t-1}$ and $x_{t}$

In the $0$-th iteration:

• Choose any $\phi_{i,0} \leq \phi_i$ and any $y_0$ so that $f(y_0)< \infty$

Termination condition

• Run algorithm till $x_T = y_T$. At this point, return $y_T$
• Theoretical guarantee and results

• At any time-step $t>0$, BlitzWS guarantees a decrease in the sub-optimality gap controlled by the progress parameter as follows : $\Delta_t \leq (1-\xi_t)\Delta_{t-1}$
• BlitzWS can also be modified to accommodate approximate solutions to subproblems (instead of finding precise minimizers $x_t$ of $f_t$)

Raw Notes

If you like applying these kinds of methods practical ML problems, join our team.