Optimized Gradient

Source file

wc_optimized_gradient(L, n; verbose=true)

Problem statement

Compute a PEPit worst-case guarantee for wc_optimized_gradient.

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where $f$ is $L$-smooth and convex.

Performance metric

This code computes a worst-case guarantee for optimized gradient method (OGM), and applies the trace heuristic for trying to find a low-dimensional worst-case example on which this guarantee is nearly achieved. That is, it computes the smallest possible $\tau(n, L)$ such that the guarantee

\[f(x_n) - f_\star \leqslant \tau(n, L) \|x_0 - x_\star\|^2\]

is valid, where $x_n$ is the output of OGM and where $x_\star$ is a minimizer of $f$. Then, it applies the trace heuristic, which allows obtaining a one-dimensional function on which the guarantee is nearly achieved.

Algorithm

The optimized gradient method is described by

\[ \begin{aligned} x_{t+1} & = & y_t - \frac{1}{L} \nabla f(y_t)\\ y_{t+1} & = & x_{t+1} + \frac{\theta_{t}-1}{\theta_{t+1}}(x_{t+1}-x_t)+\frac{\theta_{t}}{\theta_{t+1}}(x_{t+1}-y_t), \end{aligned}\]

with

\[ \begin{aligned} \theta_0 & = & 1 \\ \theta_t & = & \frac{1 + \sqrt{4 \theta_{t-1}^2 + 1}}{2}, \forall t \in [|1, n-1|] \\ \theta_n & = & \frac{1 + \sqrt{8 \theta_{n-1}^2 + 1}}{2}. \end{aligned}\]

Theoretical guarantee

The tight theoretical guarantee can be found in [2, Theorem 2]:

\[f(x_n)-f_\star \leqslant \frac{L\|x_0-x_\star\|^2}{2\theta_n^2}.\]

References

The OGM was developed in [1,2]. Low-dimensional worst-case functions for OGM were obtained in [3, 4].

[1] Y. Drori, M. Teboulle (2014). Performance of first-order methods for smooth convex minimization: a novel approach. Mathematical Programming 145(1-2), 451-482.

[2] D. Kim, J. Fessler (2016). Optimized first-order methods for smooth convex minimization. Mathematical Programming 159.1-2: 81-107.

[3] A. Taylor, J. Hendrickx, F. Glineur (2017). Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Mathematical Programming, 161(1-2), 307-345.

[4] D. Kim, J. Fessler (2017). On the convergence analysis of the optimized gradient method. Journal of Optimization Theory and Applications, 172(1), 187-205.

Arguments

  • L: smoothness or Lipschitz parameter, as used by the modeled class.
  • n: number of iterations.
  • verbose: print example and solver progress information when true.

Returns

  • pepit_tau: worst-case value
  • theoretical_tau: theoretical value

Julia usage

pepit_tau, theoretical_tau = wc_optimized_gradient(3.0, 4; verbose=true)
## Returns approximately: (pepit_tau, theoretical_tau) = (0.076752, 0.076752)