Polyak Steps In Distance To Optimum

Source file

wc_polyak_steps_in_distance_to_optimum(L, mu, gamma; solver=Clarabel.Optimizer, verbose=true)

Problem statement

Compute a PEPit worst-case guarantee for wc_polyak_steps_in_distance_to_optimum.

Consider the minimization problem

\[f_\star \triangleq \min_x f(x),\]

where $f$ is $L$-smooth and $\mu$-strongly convex, and $x_\star=\arg\min_x f(x)$.

Performance metric

This code computes a worst-case guarantee for a variant of a gradient method relying on Polyak step-sizes (PS). That is, it computes the smallest possible $\tau(L, \mu, \gamma)$ such that the guarantee

\[\|x_{t+1} - x_\star\|^2 \leqslant \tau(L, \mu, \gamma) \|x_{t} - x_\star\|^2\]

is valid, where $x_t$ is the output of the gradient method with PS and $\gamma$ is the effective value of the step-size of the gradient method with PS.

In short, for given values of $L$, $\mu$, and $\gamma$, $\tau(L, \mu, \gamma)$ is computed as the worst-case value of $\|x_{t+1} - x_\star\|^2$ when $\|x_{t} - x_\star\|^2 \leqslant 1$.

Algorithm

Gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where $\gamma$ is a step-size. The Polyak step-size rule under consideration here corresponds to choosing of $\gamma$ satisfying:

\[\gamma \|\nabla f(x_t)\|^2 = 2 (f(x_t) - f_\star).\]

Theoretical guarantee

The gradient method with the variant of Polyak step-sizes under consideration enjoys the tight theoretical guarantee [1, Proposition 1]:

\[\|x_{t+1} - x_\star\|^2 \leqslant \tau(L, \mu, \gamma) \|x_{t} - x_\star\|^2,\]

where $\gamma$ is the effective step-size used at iteration $t$ and

\[ \begin{aligned} \tau(L, \mu, \gamma) & = & \left\{\begin{array}{ll} \frac{(\gamma L-1)(1-\gamma \mu)}{\gamma(L+\mu)-1} & \text{if } \gamma\in[\tfrac{1}{L},\tfrac{1}{\mu}],\\ 0 & \text{otherwise.} \end{array}\right. \end{aligned}\]

References

[1] M. Barre, A. Taylor, A. d'Aspremont (2020). Complexity guarantees for Polyak steps with momentum. In Conference on Learning Theory (COLT).

Arguments

  • L: smoothness or Lipschitz parameter, as used by the modeled class.
  • mu: strong convexity or monotonicity parameter, as used by the modeled class.
  • gamma: step-size parameter.
  • solver: JuMP optimizer constructor used to solve the generated SDP.
  • verbose: print example and solver progress information when true.

Returns

  • pepit_tau: worst-case value
  • theoretical_tau: theoretical value

Julia usage

pepit_tau, theoretical_tau = wc_polyak_steps_in_distance_to_optimum(1.0, 0.1, 2 / (1.0 + 0.1); solver=Clarabel.Optimizer, verbose=true)
## Returns approximately: (pepit_tau, theoretical_tau) = (0.669421, 0.669421)