Robust Momentum

Source file

wc_robust_momentum(mu, L, lam; solver=Clarabel.Optimizer, verbose=true)

Problem statement

Compute a PEPit worst-case guarantee for wc_robust_momentum.

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where $f$ is $L$-smooth and $\mu$-strongly-convex.

Performance metric

This code computes a worst-case guarantee for the robust momentum method (RMM). That is, it computes the smallest possible $\tau(n, \mu, L, \lambda)$ such that the guarantee

\[v(x_{n+1}) \leqslant \tau(n, \mu, L, \lambda) v(x_{n}),\]

is valid, where $x_n$ is the $n^{\mathrm{th}}$ iterate of the RMM, and $x_\star$ is a minimizer of $f$. The function $v(.)$ is a well-chosen Lyapunov defined as follows,

\[ \begin{aligned} v(x_t) & = & l\|z_t - x_\star\|^2 + q_t, \\ q_t & = & (L - \mu) \left(f(x_t) - f_\star - \frac{\mu}{2}\|y_t - x_\star\|^2 - \frac{1}{2}\|\nabla f(y_t) - \mu (y_t - x_\star)\|^2 \right), \end{aligned}\]

with $\kappa = \frac{\mu}{L}$, $\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)$, and $l = \mu^2 \frac{\kappa - \kappa \rho^2 - 1}{2 \rho (1 - \rho)}$.

Algorithm

For $t \in \{0, \dots, n-1\}$,

\[ \begin{aligned} x_{t+1} & = & x_{t} + \beta (x_t - x_{t-1}) - \alpha \nabla f(y_t), \\ y_{t+1} & = & y_{t} + \gamma (x_t - x_{t-1}), \end{aligned}\]

with $x_{-1}, x_0 \in \mathrm{R}^d$, and with parameters $\alpha = \frac{\kappa (1 - \rho^2)(1 + \rho)}{L}$, $\beta = \frac{\kappa \rho^3}{\kappa - 1}$, $\gamma = \frac{\rho^2}{(\kappa - 1)(1 - \rho)^2(1 + \rho)}$.

Theoretical guarantee

A convergence guarantee (empirically tight) is obtained in [1, Theorem 1],

\[v(x_{n+1}) \leqslant \rho^2 v(x_n),\]

with $\rho = \lambda (1 - \frac{1}{\kappa}) + (1 - \lambda) \left(1 - \frac{1}{\sqrt{\kappa}}\right)$.

References

[1] S. Cyrus, B. Hu, B. Van Scoy, L. Lessard (2018). A robust accelerated optimization algorithm for strongly convex functions. American Control Conference (ACC).

Arguments

mu: strong convexity or monotonicity parameter, as used by the modeled class.
L: smoothness or Lipschitz parameter, as used by the modeled class.
lam: if $\lambda=1$ it is the gradient descent, if $\lambda=0$, it is the Triple Momentum Method.
solver: JuMP optimizer constructor used to solve the generated SDP.
verbose: print example and solver progress information when true.

Returns

pepit_tau: worst-case value
theoretical_tau: theoretical value

Julia usage

pepit_tau, theoretical_tau = wc_robust_momentum(0.1, 1.0, 0.2; solver=Clarabel.Optimizer, verbose=true)
## Returns approximately: (pepit_tau, theoretical_tau) = (0.528555, 0.528555)