Gradient Descent
wc_gradient_descent(L, gamma, n; solver=Clarabel.Optimizer, verbose=true)Problem statement
Compute a PEPit worst-case guarantee for wc_gradient_descent.
Consider the minimization problem
\[f_\star \triangleq \min_x f(x),\]
where $f$ is $L$-smooth.
Performance metric
This code computes a worst-case guarantee for gradient descent with fixed step-size $\gamma$, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee. That is, it computes the smallest possible $\tau(n, L, \gamma)$ such that the guarantee
\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \tau(n, L, \gamma) (f(x_0) - f(x_n))\]
is valid, where $x_n$ is the n-th iterates obtained with the gradient method with fixed step-size. Then, it looks for a low-dimensional nearly achieving this performance.
Algorithm
Gradient descent is described as follows, for $t \in \{ 0, \dots, n-1\}$,
\[x_{t+1} = x_t - \gamma \nabla f(x_t),\]
where $\gamma$ is a step-size and.
Theoretical guarantee
When $\gamma \leqslant \frac{1}{L}$, an empirically tight theoretical worst-case guarantee is
\[\min_{t\leqslant n} \|\nabla f(x_t)\|^2 \leqslant \frac{4}{3}\frac{L}{n} (f(x_0) - f(x_n)),\]
see discussions in [1, page 190] and [2].
References
Arguments
L: smoothness or Lipschitz parameter, as used by the modeled class.gamma: step-size parameter.n: number of iterations.solver: JuMP optimizer constructor used to solve the generated SDP.verbose: print example and solver progress information when true.
Returns
pepit_tau: worst-case value.theoretical_tau: theoretical value.
Julia usage
pepit_tau, theoretical_tau = wc_gradient_descent(L, gamma, 5; solver=Clarabel.Optimizer, verbose=true)
## Returns approximately: (pepit_tau, theoretical_tau) = (0.266657, 0.266667)