Optimization

BlitzWS | May 18, 2021

By Nivedita Bhaskhar - May 18, 2021

In our discussion this week, we looked at a "fast and principled working set algorithm" called BlitzWS proposed by Johnson and Guestrin.

Materials

Paper link, Johnson and Guestrin, 2018

Why BlitzWS?

BlitzWS is a working set algorithm designed for optimizing a piecewise objective function, where each piece and also the objective function is assumed to have nice convexity properties. BlitzWS is an iterative algorithm, where during each iteration, it optimizes a relaxed objective function which is generally a simpler sub-problem to solve. By solving a sequence of such subproblems, BlitzWS converges to the original problem’s solution.

The highlight of the paper is that BlitzWS carefully constructs the relaxed objective function at each step so as to guarantee a specified amount of progress in each iteration. Further BlitzWS can be applied to a wide range of practical problems including SVM, Lasso et al.

Nuggets

  • Optimization problem and shape of the objective function

    Find the minimizer of the function f(x)=ψ(x)+i=1mϕi(x)f(x) = \psi(x) + \sum_{i=1}^{m} \phi_i(x) where

    • All functions are from RnR\mathbb{R}^n\to \mathbb{R}
    • Each ϕi\phi_i is a piecewise function, i.e. there is a partition πi:Rn{1,2,,pi}\pi_i : \mathbb{R}^n \to \{1,2,\ldots, p_i\} so that on the j-th partition πi1(j)\pi_i^{-1}(j), ϕi=ϕi(j)\phi_i = \phi_i^{(j)}
    • Convexity and continuity assumptions about ff
      • Assume ψ\psi, ϕi\phi_i, ϕi(j)\phi_i^{(j)} are all convex and lower semicontinuous

      • Assume ψ\psi is 1-strongly-convex, i.e.

        ψ(y)ψ(x)+yx,f(x)+12yx2\psi(y) \geq \psi(x) + \langle y-x, \nabla f(x) \rangle + \frac{1}{2}||y-x||^2

  • How to measure progress

    In the tt-th iteration

    • ftf_t is the relaxed objective function that is minimized, which will be arranged so that ftff_t\leq f
    • BlitzWS computes two iterates
      • xtx_t which is the argmin of ftf_t
      • A second iterate yty_t
    • Sub-optimality gap for the tt-th iteration is defined to be Δt:=f(yt)ft(xt)\Delta_t := f(y_t)-f_t(x_t)
    • Let xx^* be the minimizer of ff. Then note that ft(xt)f(x)f(yt)f_t(x_t)\leq f(x^*) \leq f(y_t) and so Δt0\Delta_t \geq 0
    • Δt=0f(yt)=f(x)\Delta_t = 0 \implies f(y_t) = f(x^*) and hence Δt\Delta_t decrease is progress!
  • BlitzWS algorithm

    In the tt-th iteration:

    • Select progress parameter ξt(0,1]\xi_t\in (0,1] (Note that a larger ξt\xi_t will imply a guarantee of more progress - see theorem below)
    • Determine relaxed objective function ftf_t which will be of the shape ft=ψ+i=1mϕi,tf_t = \psi + \sum_{i=1}^{m} \phi_{i,t} where ϕi,t\phi_{i,t} will either be ϕi\phi_i or one of the pieces ϕi(j)\phi_i^{(j)}
    • Compute xtx_t, the minimizer of ftf_t (i.e. solve subproblem tt)
    • Compute second iterate yty_t to be minimizer of f(x)f(x) subject to the constraint that it lies on the line segment joining yt1y_{t-1} and xtx_{t}

    In the 00-th iteration:

    • Choose any ϕi,0ϕi\phi_{i,0} \leq \phi_i and any y0y_0 so that f(y0)<f(y_0)< \infty

    Termination condition

    • Run algorithm till xT=yTx_T = y_T. At this point, return yTy_T
  • Theoretical guarantee and results

    • At any time-step t>0t>0, BlitzWS guarantees a decrease in the sub-optimality gap controlled by the progress parameter as follows : Δt(1ξt)Δt1\Delta_t \leq (1-\xi_t)\Delta_{t-1}
    • BlitzWS can also be modified to accommodate approximate solutions to subproblems (instead of finding precise minimizers xtx_t of ftf_t)

Raw Notes

If you like applying these kinds of methods practical ML problems, join our team.


Read more

glmnet | February 23, 2021

This week, we discuss glment, the widely-renowned, frequently-used, and efficient Lasso optimization package.

Read more

Quasi-Newton part two: BFGS | October 27, 2020

A discussion of Quasi-Newton methods that create approximations that meet this criterion and thus enjoy global and local convergence properties.

Read more