Gradient Descent Contraction

Source file

wc_gradient_descent_contraction(L, mu, gamma, n; solver=Clarabel.Optimizer, verbose=true)

Problem statement

Compute a PEPit worst-case guarantee for wc_gradient_descent_contraction.

Consider the convex minimization problem

\[f_\star \triangleq \min_x f(x),\]

where $f$ is $L$-smooth and $\mu$-strongly convex.

Performance metric

This code computes a worst-case guarantee for gradient descent with fixed step-size $\gamma$. That is, it computes the smallest possible $\tau(n, L, \mu, \gamma)$ such that the guarantee

\[\| x_n - y_n \|^2 \leqslant \tau(n, L, \mu, \gamma) \| x_0 - y_0 \|^2\]

is valid, where $x_n$ and $y_n$ are the outputs of the gradient descent method with fixed step-size $\gamma$, starting respectively from $x_0$ and $y_0$.

In short, for given values of $n$, $L$, $\mu$ and $\gamma$, $\tau(n, L, \mu \gamma)$ is computed as the worst-case value of $\| x_n - y_n \|^2$ when $\| x_0 - y_0 \|^2 \leqslant 1$.

Algorithm

For $t\in\{0,1,\ldots,n-1\}$, gradient descent is described by

\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]

where $\gamma$ is a step-size.

Theoretical guarantee

The tight theoretical guarantee is

\[\| x_n - y_n \|^2 \leqslant \max\{(1-L\gamma)^2,(1-\mu \gamma)^2\}^n\| x_0 - y_0 \|^2,\]

which is tight on simple quadratic functions.

References

No specific bibliographic reference is associated with this example.

Arguments

  • L: smoothness or Lipschitz parameter, as used by the modeled class.
  • mu: strong convexity or monotonicity parameter, as used by the modeled class.
  • gamma: step-size parameter.
  • n: number of iterations.
  • solver: JuMP optimizer constructor used to solve the generated SDP.
  • verbose: print example and solver progress information when true.

Returns

  • pepit_tau: worst-case value
  • theoretical_tau: theoretical value

Julia usage

pepit_tau, theoretical_tau = wc_gradient_descent_contraction(L, mu, gamma, n; verbose=true)
## Returns approximately: (pepit_tau, theoretical_tau) = (0.81, 0.81)