A Variational Approach to Accelerated Optimization
Efficient optimization has become one of the major concerns in data analysis. There has been a lot of focus on first-order optimization algorithms because of their low cost per iteration. In 1983, Nesterov's Accelerated Gradient method (NAG) was shown to converge in O(1/k^2) to the minimum of the convex objective function f(x), improving on the O(1/k) convergence rate exhibited by the standard gradient descent methods, which is the phenomenon referred to as acceleration. It was shown that NAG limits to a second order ODE, as the time step goes to 0, and that the objective function f(x(t)) converges to its optimal value at a rate of $O(1/t^2) along the trajectories of this ODE. In this talk, we will discuss how the convergence of f(x(t)) can be accelerated in continuous time to an arbitrary convergence rate O(1/t^p) in normed spaces, by considering flow maps generated by a family of time-dependent Bregman Lagrangian and Hamiltonian systems which is closed under time rescaling. We will then discuss how this variational framework can be exploited together with the time-invariance property of the family of Bregman Lagrangians using adaptive geometric integrators to design efficient explicit algorithms for symplectic accelerated optimization. Finally, we will discuss briefly the generalization from normed spaces to Riemannian manifolds.