Proximal Gradient Quadratics

Source file

wc_proximal_gradient_quadratics(L, mu, gamma, n; solver=Clarabel.Optimizer, verbose=true)

Problem statement

Compute a PEPit worst-case guarantee for wc_proximal_gradient_quadratics.

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where $f_1$ is $L$-smooth, $\mu$-strongly convex and quadratic, and where $f_2$ is closed convex and proper.

Performance metric

This code computes a worst-case guarantee for the proximal gradient method (PGM). That is, it computes the smallest possible $\tau(n, L, \mu)$ such that the guarantee

\[\|x_n - x_\star\|^2 \leqslant \tau(n, L, \mu) \|x_0 - x_\star\|^2,\]

is valid, where $x_n$ is the output of the proximal gradient, and where $x_\star$ is a minimizer of $F$. In short, for given values of $n$, $L$ and $\mu$, $\tau(n, L, \mu)$ is computed as the worst-case value of $\|x_n - x_\star\|^2$ when $\|x_0 - x_\star\|^2 \leqslant 1$.

Algorithm

Proximal gradient is described by

\[ \begin{aligned} y_t & = & x_t - \gamma \nabla f_1(x_t), \\ x_{t+1} & = & \arg\min_x \left\{f_2(x)+\frac{1}{2\gamma}\|x-y_t\|^2 \right\}, \end{aligned}\]

for $t \in \{ 0, \dots, n-1\}$ and where $\gamma$ is a step-size.

Theoretical guarantee

It is well known that a tight guarantee for PGM is provided by

\[\|x_n - x_\star\|^2 \leqslant \max\{(1-L\gamma)^2,(1-\mu\gamma)^2\}^n \|x_0 - x_\star\|^2,\]

which can be found in, e.g., [1, Theorem 3.1]. It is a folk knowledge and the result can be found in many references for gradient descent; see, e.g.,[2, Section 1.4: Theorem 3], [3, Section 5.1] and [4, Section 4.4].

References

[1] A. Taylor, J. Hendrickx, F. Glineur (2018). Exact worst-case convergence rates of the proximal gradient method for composite convex minimization. Journal of Optimization Theory and Applications, 178(2), 455-476.

[2] B. Polyak (1987). Introduction to Optimization. Optimization Software New York.

[3] E. Ryu, S. Boyd (2016). A primer on monotone operator methods. Applied and Computational Mathematics 15(1), 3-43.

[4] L. Lessard, B. Recht, A. Packard (2016). Analysis and design of optimization algorithms via integral quadratic constraints. SIAM Journal on Optimization 26(1), 57-95.

Arguments

  • L: smoothness or Lipschitz parameter, as used by the modeled class.
  • mu: strong convexity or monotonicity parameter, as used by the modeled class.
  • gamma: step-size parameter.
  • n: number of iterations.
  • solver: JuMP optimizer constructor used to solve the generated SDP.
  • verbose: print example and solver progress information when true.

Returns

  • pepit_tau: worst-case value.
  • theoretical_tau: theoretical value.

Julia usage

pepit_tau, theoretical_tau = wc_proximal_gradient_quadratics(1.0, 0.1, 1.0, 2; solver=Clarabel.Optimizer, verbose=true)
## Returns approximately: (pepit_tau, theoretical_tau) = (0.6561, 0.6561)