The workshop will take place on Monday February the 13th (from 11:30 a.m.) and Tuesday February the 14th (to 3 p.m.).
11:45am
|
Opening talk - Adrien Taylor (Inria Paris)
|
Biography
Adrien obtained his PhD in early 2017 at the department of mathematical engineering of the Université catholique de Louvain. Then, he was a postdoctoral researcher in the SIERRA team at INRIA Paris in 2017-2019 before starting as a research scientist in the same team in 2019.
|
My personal view on computer-assisted analyses and design of optimization methods via performance
estimation problems
Abstract
In this presentation, I want to provide an overview of the performance estimation approach for
analyzing and designing optimization methods. I will focus on what I believe are the key ingredients,
strengths, and weaknesses of the methodology. I will also discuss a few aspects and links that I believe
were a bit overlooked in the last years, as well as the use of performance estimation beyond
its comfort zone.
References:
[1] B. Goujaud, C. Moucer, F. Glineur, J. M. Hendrickx, A. Taylor, A. Dieuleveut, “PEPit: computer-assisted worst-case analyses of first-order optimization methods in Python”, 2023. PDF
[2] Adrien Taylor, Francis Bach, “Stochastic first-order methods: non-asymptotic and computer-aided analyses via potential functions”, 2021. PDF
[ Preprint , Slides , Video ]
|
2:00pm
|
Talks Session 1
|
Anne Rubens (UCLouvain) - A novel approach for interpolation
In this talk, we present a novel approach for the interpolation problem of given function classes: what
necessary and sufficient conditions must a set of data satisfy to ensure the existence of a function
of the class defined on the whole space and interpolating the data? The derivation of such conditions
is crucial in the PEP framework , since a priori tight performance guarantees can only be obtained
for function classes whose interpolation conditions are known.
Our method to analyze conditions and
hence extend the field of applications of the framework is the following: given a constraint
satisfied by a finite set of data, we require it to be extensible to any arbitrary new point,
instead of interpolable by a function defined on the whole space. This approach to the interpolation
problem allows one to get rid of all analytic properties of the function class to work only at an
algebraic level. As a side product, it allows to easily obtain counterexamples when a condition
characterizing a function class is not interpolable. As an illustration, we provide interpolation
conditions for weakly convex functions whose quasi-subgradient is bounded, and weakly convex functions
satisfying a sharpness condition.
Sebastian Banert (Lund University) - Deriving algorithms with deviations
This talk will present a methodology to derive algorithms for convex optimisation and monotone inclusions.
It is based on performance estimation of one step of a Lyapunov function with symbolic calculations.
The resulting algorithms allow for a trade-off between flexibility and guaranteed performance, where
the former is expressed by the possibility to choose the iteration within a ball of a radius that
is given by quantities that have already been computed. We call this concept "deviations", and allows
for heuristic modifications of existing algorithms (e.g., by means of deep learning) while retaining
convergence or performance guarantees.
Sébastien Colla (UCLouvain) - Automated Performance Estimation for Decentralized Optimization via Network Size Independent Problems
We present a methodology to automatically compute worst-case performance bounds for a large class of
first-order decentralized optimization algorithms. These algorithms aim at minimizing the average
of local functions that are distributed across a network of agents. They typically combine local
computations and consensus steps. Our methodology is based on the approach of Performance Estimation
Problem (PEP), which allows computing the worst-case performance and a worst-case instance of
first-order optimization algorithms by solving an SDP.
In previous work, we propose two ways of representing consensus steps in PEPs, which allow writing and solving PEPs for decentralized optimization.
In these formulations, the size of the PEP is growing with the number of agents in the network but in most cases, the worst-case guarantees obtained are not changing with the number of agents.
In this work we therefore propose a new PEP formulation for decentralized optimization whose size is independent of the number of agents in the network.
For this purpose, we take a global view of the decentralized problem and we also decouple the consensus subspace and its orthogonal complement.
We apply our methodology to different decentralized methods such as DGD, DIGing and EXTRA and obtain numerically tight performance guarantees that are valid for any network size.
[ Preprint , Slides , Video ]
|
3:30pm
|
Talks Session 2
|
Hadi Abbaszadehpeivasti (Tilburg University) - Convergence rate analysis of the gradient
descent-ascent method for convex-concave saddle-point problems
In this talk, I present our study on the gradient descent-ascent method for convex-concave saddle-point
problems. We derive a new non-asymptotic global convergence rate in terms of distance to the solution
set by using the semidefinite programming performance estimation method. The given convergence rate
incorporates most parameters of the problem, and it is exact for a large class of strongly convex-strongly
concave saddle-point problems for one iteration. We also investigate the algorithm without strong
convexity, and we provide some necessary and sufficient conditions under which the gradient
descent-ascent enjoys linear convergence.
[ Preprint , Slides , Video ]
Eduard Gorbunov (Mohamed bin Zayed University of Artificial Intelligence) - Convergence of
Proximal Point and Extragradient-Based Methods Beyond Monotonicity: the Case of Negative Comonotonicity
Algorithms for min-max optimization and variational inequalities are often studied under monotonicity
assumptions. Motivated by non-monotone machine learning applications, we follow the line of works
[Diakonikolas et al., 2021, Lee and Kim, 2021, Pethick et al., 2022, Böhm, 2022] aiming at going
beyond monotonicity by considering the weaker negative comonotonicity assumption. In particular,
we provide tight complexity analyses for the Proximal Point, Extragradient, and Optimistic Gradient
methods in this setup, closing some questions on their working guarantees beyond monotonicity.
[ Preprint , Slides , Video ]
Moslem Zamani (Tilburg University) - The exact worst-case convergence rate of the alternating
direction method of multipliers
In this talk, we derive new non-ergodic convergence rates for the alternating direction method of
multipliers (ADMM) by using performance estimation. We give some examples which show the exactness
of the given bounds. Moreover, we study the linear (R-linear) convergence of ADMM under some
scenarios which are weaker than the existing ones in the literature.
[ Preprint , Slides , Video ]
|
5:00pm
|
Long Talk - Etienne de Klerk (Tilburg University)
|
Biography
Etienne de Klerk completed his BSc and MSc degrees at the University of Pretoria in South Africa, and
obtained his PhD degree from the Delft University of Technology in The Netherlands in 1997. From
January 1998 to September 2003, he held assistant professorships at the Delft University of Technology,
and from September 2003 to September 2005 an associate professorship at the University of Waterloo,
Canada, in the Department of Combinatorics & Optimization. In September 2004 he was appointed at
Tilburg University, The Netherlands, first as an associate professor, and then as full professor
(from June 2009). From August 29th, 2012, until August 31st, 2013, he was also appointed as full
professor in the Division of Mathematics of the School of Physical and Mathematical Sciences at
the Nanyang Technological University in Singapore. From September 1st, 2015 to August 31st 2019,
he also held a full professorship at the Delft University of Technology (1 day/week position).
|
Some recent advances in SDP performance analysis
Abstract
Semidefinite programming (SDP) performance analysis was introduced in 2014 in a seminal paper by
Teboulle and Drori. Since then it has become a standard tool in establishing the (often exact) rates
of convergence of many iterative methods, by formulating the worst case convergence rate as the optimal
value of an SDP problem. This has yielded new insight on many methods, including gradient descent,
Newton’s method for self-concordant functions, ADMM, DCA, etc. In this talk we will review some recent
results on SDP performance analysis for iterative methods. The talk will be based on joint work with
Hadi Abbaszadehpeivasti and Moslem Zamani.
[ Preprint , Slides ]
|
9:00am
|
Talks Session 3
|
Nizar Bousselmi (UCLouvain) - Performance Estimation of First-Order Methods on Functions composed
with Linear Mappings
We develop a Performance Estimation Problem representation for linear mappings. We consider convex
optimization problems involving linear mappings, such as those whose objective functions include
compositions of the type \(g(Mx)\), or featuring linear constraints of the form \(Mx=b\). First-order methods
designed to tackle these problems will typically exploit their specific structure and will need to
compute at each iteration products of iterates by matrices \(M\) or \(M^T\).
Our goal is to identify the worst-case behavior of such first-order methods, based on the Performance
Estimation Problem (PEP) methodology. We develop interpolation conditions for linear operators M and
\(M^T\). It allows us to embed them in the PEP framework, and thus, to evaluate the worst-case performance
of a wide variety of problems involving linear mappings. We cover both the symmetric and nonsymmetric
cases and allow bounds on the spectrum of these operators (lower and upper bounds on the eigenvalues
in the symmetric case, maximum singular value in the nonsymmetric case). As a byproduct we also
obtain interpolation conditions and worst-case performance for the class of convex quadratic functions.
We demonstrate the scope of our tool by computing several tight worst-case convergence rates,
including that of the gradient method applied to the minimization of \(g(Mx)\) and that of the
Chambolle-Pock algorithm.
[ Preprint , Slides , Video ]
Céline Moucer (Inria Paris) - A systematic approach to Lyapunov analyses of continuous-time
models in convex optimization
First-order methods are often analyzed via their continuous-time models, where their worst-case convergence
properties are usually approached via Lyapunov functions. In this work, we provide a systematic and
principled approach to find and verify Lyapunov functions for classes of ordinary and stochastic
differential equations. More precisely, we extend the performance estimation framework, originally
proposed by Drori and Teboulle (2012), to continuous-time models. We retrieve convergence results
comparable to those of discrete methods using fewer assumptions and convexity inequalities, and provide
new results for stochastic accelerated gradient flows.
[ Preprint , Slides , Video ]
Yassine Kamri (UCLouvain) - On the worst-case analysis of block coordinate-wise algorithms
Block coordinate-wise algorithms are an essential class of first-order optimization methods widely used
to solve large-scale optimization problems. However, their worst-case performance is still not well
understood and their practical success has not yet been explained by existing convergence analyses.
We analyze the worst-case behavior of cyclic versions of Block coordinate-wise algorithms in the
context of unconstrained optimization of convex functions with coordinate-wise Lipschitz gradients.
For this purpose, we rely on the recently proposed Performance Estimation Problem (PEP) and develop a
new characterization for this class of function, from which we obtain necessary interpolation conditions.
In this paper, we present a unifying framework for the worst-case analysis of Block coordinate-wise
algorithms, and in some situations, we are able to substantially outperform the best current bounds.
Preprint: https://arxiv.org/pdf/2211.17018.pdf
|
10:35am
|
Long Talk - Shuvomoy Das Gupta (MIT)
|
Biography
Shuvomoy Das Gupta is a fourth-year Ph.D. student at the MIT Operations Research Center. His research
focuses on developing efficient algorithms for large-scale optimization and machine learning.
He obtained his M.A.Sc. from the ECE Department at the University of Toronto in 2016 and then
worked as a researcher at the Research & Technology Department of Thales Canada for three years.
His M.A.Sc. research on energy-efficient railway timetables has been applied to the largest installed
base of communication-based train control systems worldwide. Previously, he obtained a Bachelor of
Science in Electrical and Electronic Engineering from the Bangladesh University of Engineering and
Technology.
|
Design and analysis of first-order methods via nonconvex QCQP frameworks
Abstract
In this presentation, we will provide an overview of recent approaches for analyzing and designing first-order methods using nonconvex QCQPs. In this computer-assisted approach, the key idea involves posing the analysis or design of first-order methods as nonconvex but practically tractable QCQPs and then solving them to global optimality using custom spatial branch-and-bound algorithms. In particular, we will illustrate how to use this nonconvex QCQP framework to (i) analyze adaptive first-order methods, and (ii) design efficient fixed-step first-order methods.
First, we will discuss a nonconvex QCQP framework [1] to analyze the worst-case convergence of nonlinear conjugate gradient methods (NCGMs) for smooth strongly convex minimization. NCGMs are adaptive methods known for their good empirical performance but have incomplete analyses. By solving the associated nonconvex QCQPs to global optimality, we construct mathematical proofs that establish the first non-asymptotic convergence bound for Fletcher-Reeves NCGM (historically the first NCGM) and an improved non-asymptotic convergence bound for Polak-Ribi`ere-Polyak NCGM. Conversely, from the solutions of those QCQPs we construct “bad” functions on which these NCGMs perform worse than the regular steepest descent.
Then, we will briefly discuss BnB-PEP [2], which is a framework for numerically constructing optimal fixed-step first-order methods for convex and nonconvex optimization. We will provide examples of numerically constructed optimal methods and analytical convergence proofs with potential function structures that are generated using BnB-PEP.
References:
[1] Shuvomoy Das Gupta, Robert M. Freund, Xu Andy Sun, Adrien B. Taylor, “Nonlinear conjugate gradient methods: worst-case convergence rates via computer-assisted analyses”, 2023. PDF, Code
[2] Shuvomoy Das Gupta, Bart P.G. Van Parys, Ernest K. Ryu, “Branch-and-Bound Performance Estimation Programming: A Unified Methodology for Constructing Optimal Optimization Methods”, 2022. PDF, Code
[ Preprint , Slides , Video ]
|
2:00pm
|
Long Talk - Pontus Giselsson (Lund University) - part of INMA Seminar series
|
Biography
Pontus Giselsson is an Associate Professor at the Department of Automatic Control at Lund University,
Sweden. His current research interests include mathematical optimization and its applications in, e.g.,
control, machine learning, statistical estimation, and signal processing. He received an M.Sc. degree from
Lund University in 2006 and a Ph.D. degree from Lund University in 2012.
|
This talk will be co-presented by Manu Updhyaya (Lund University).
Tight Lyapunov function existence analysis for first-order methods
Abstract
We present a unifying framework for establishing linear convergence rates for common
first-order methods used to solve convex optimization problems. In particular, we consider i) classes
of convex optimization problems of finite sum form with (possibly strongly) convex and (possibly)
smooth functional components, and ii) first-order methods that can be written in so-called state-space
form, i.e., as a linear system in feedback interconnection with the subdifferentials of the functional
components of the objective function. The validity of a given target linear convergence rate is
established by deriving a necessary and sufficient condition for verifying the existence of a quadratic
Lyapunov function for the algorithm and problem class under consideration for the chosen rate, which
amounts to the feasibility of a small-sized semidefinite program. This allows us to find the smallest
linear convergence rate for which such a quadratic Lyapunov function exists, by bisection search,
yielding a tight procedure. The approach is numerically exemplified on several algorithmic schemes.
[ Preprint , Slides , Video ]
|