Optimization

By Nivedita Bhaskhar - May 18, 2021

In our discussion this week, we looked at a "fast and principled working set algorithm" called BlitzWS proposed by Johnson and Guestrin.

**Materials**

Paper link, Johnson and Guestrin, 2018

**Why BlitzWS?**

BlitzWS is a working set algorithm designed for optimizing a *piecewise* objective function, where each piece and also the objective function is assumed to have nice convexity properties. BlitzWS is an iterative algorithm, where during each iteration, it optimizes a *relaxed objective function* which is generally a simpler sub-problem to solve. By solving a sequence of such subproblems, BlitzWS converges to the original problem’s solution.

The highlight of the paper is that BlitzWS carefully constructs the relaxed objective function at each step so as to guarantee a specified amount of progress in each iteration. Further BlitzWS can be applied to a wide range of practical problems including SVM, Lasso et al.

**Nuggets**

**Optimization problem and shape of the objective function**Find the minimizer of the function $f(x) = \psi(x) + \sum_{i=1}^{m} \phi_i(x)$ where

- All functions are from $\mathbb{R}^n\to \mathbb{R}$
- Each $\phi_i$ is a
*piecewise*function, i.e. there is a partition $\pi_i : \mathbb{R}^n \to \{1,2,\ldots, p_i\}$ so that on the j-th partition $\pi_i^{-1}(j)$, $\phi_i = \phi_i^{(j)}$ - Convexity and continuity assumptions about $f$
Assume $\psi$, $\phi_i$, $\phi_i^{(j)}$ are all convex and lower semicontinuous

Assume $\psi$ is

**1-strongly-convex**, i.e.$\psi(y) \geq \psi(x) + \langle y-x, \nabla f(x) \rangle + \frac{1}{2}||y-x||^2$

**How to measure progress**In the $t$-th iteration

- $f_t$ is the relaxed objective function that is minimized, which will be arranged so that $f_t\leq f$
- BlitzWS computes two iterates
- $x_t$ which is the argmin of $f_t$
- A second iterate $y_t$

*Sub-optimality gap*for the $t$-th iteration is defined to be $\Delta_t := f(y_t)-f_t(x_t)$- Let $x^*$ be the minimizer of $f$. Then note that $f_t(x_t)\leq f(x^*) \leq f(y_t)$ and so $\Delta_t \geq 0$
- $\Delta_t = 0 \implies f(y_t) = f(x^*)$ and hence $\Delta_t$ decrease is progress!

**BlitzWS algorithm**In the $t$-th iteration:

- Select progress parameter $\xi_t\in (0,1]$ (Note that a larger $\xi_t$ will imply a guarantee of more progress - see theorem below)
- Determine relaxed objective function $f_t$ which will be of the shape $f_t = \psi + \sum_{i=1}^{m} \phi_{i,t}$ where $\phi_{i,t}$ will either be $\phi_i$ or one of the pieces $\phi_i^{(j)}$
- Compute $x_t$, the minimizer of $f_t$ (i.e. solve subproblem $t$)
- Compute second iterate $y_t$ to be minimizer of $f(x)$ subject to the constraint that it lies on the line segment joining $y_{t-1}$ and $x_{t}$

In the $0$-th iteration:

- Choose any $\phi_{i,0} \leq \phi_i$ and any $y_0$ so that $f(y_0)< \infty$

Termination condition

- Run algorithm till $x_T = y_T$. At this point, return $y_T$

**Theoretical guarantee and results**- At any time-step $t>0$, BlitzWS guarantees a decrease in the sub-optimality gap controlled by the progress parameter as follows : $\Delta_t \leq (1-\xi_t)\Delta_{t-1}$
- BlitzWS can also be modified to accommodate approximate solutions to subproblems (instead of finding precise minimizers $x_t$ of $f_t$)

**Raw Notes**

- Slides for BlitzMN (toy problem of finding minimum norm in a polytope)
- Slides for BlitzWS
- Hand written notes

*If you like applying these kinds of methods practical ML problems, join our team.*