Frank Wolfe

Source file

wc_frank_wolfe(L, D, n; solver=Clarabel.Optimizer, verbose=true)

Problem statement

Compute a PEPit worst-case guarantee for wc_frank_wolfe.

Consider the composite convex minimization problem

\[F_\star \triangleq \min_x \{F(x) \equiv f_1(x) + f_2(x)\},\]

where $f_1$ is $L$-smooth and convex and where $f_2$ is a convex indicator function on $\mathcal{D}$ of diameter at most $D$.

Performance metric

This code computes a worst-case guarantee for the conditional gradient method, aka Frank-Wolfe method, and looks for a low-dimensional worst-case example nearly achieving this worst-case guarantee using $12$ iterations of the logdet heuristic. That is, it computes the smallest possible $\tau(n, L)$ such that the guarantee

\[F(x_n) - F(x_\star) \leqslant \tau(n, L) D^2,\]

is valid, where $x_n$ is the output of the conditional gradient method, and where $x_\star$ is a minimizer of $F$. In short, for given values of $n$ and $L$, $\tau(n, L)$ is computed as the worst-case value of $F(x_n) - F(x_\star)$ when $D \leqslant 1$. Then, it looks for a low-dimensional nearly achieving this performance.

Algorithm

This method was first presented in [1]. A more recent version can be found in, e.g., [2, Algorithm 1]. For $t \in \{0, \dots, n-1\}$,

\[ \begin{aligned} y_t & = & \arg\min_{s \in \mathcal{D}} \langle s \mid \nabla f_1(x_t) \rangle, \\ x_{t+1} & = & \frac{t}{t + 2} x_t + \frac{2}{t + 2} y_t. \end{aligned}\]

Theoretical guarantee

An upper guarantee obtained in [2, Theorem 1] is

\[F(x_n) - F(x_\star) \leqslant \frac{2L D^2}{n+2}.\]

References

The algorithm is presented in, among others, [1, 2]. The logdet heuristic is presented in [3].

[1] M. Frank, P. Wolfe (1956). An algorithm for quadratic programming. Naval research logistics quarterly, 3(1-2), 95-110.

[2] M. Jaggi (2013). Revisiting Frank-Wolfe: Projection-free sparse convex optimization. In 30th International Conference on Machine Learning (ICML).

[3] F. Maryam, H. Hindi, S. Boyd (2003). Log-det heuristic for matrix rank minimization with applications to Hankel and Euclidean distance matrices. American Control Conference (ACC).

Arguments

  • L: smoothness or Lipschitz parameter, as used by the modeled class.
  • D: diameter of $f_2$.
  • n: number of iterations.
  • solver: JuMP optimizer constructor used to solve the generated SDP.
  • verbose: print example and solver progress information when true.

Returns

  • pepit_tau: worst-case value.
  • theoretical_tau: theoretical value.

Julia usage

pepit_tau, theoretical_tau = wc_frank_wolfe(1.0, 1.0, 10; solver=Clarabel.Optimizer, verbose=true)
## Returns approximately: (pepit_tau, theoretical_tau) = (0.07828, 0.166667)