Algo Hour - Step sizes for SGD: Adaptivity and Convergence | Xiaoyu Li

Algo Hour
- San Francisco, CA

Video:

Xiaoyu Li is a Ph.D student at Boston University, where she is advised by Professor Francesco Orabona. She received her Bachelor’s Degree in Math & Applied Math from University of Science and Technology of China. She was a Ph.D candidate at Stony Brook University and worked as an intern at Nokia Bell Labs. Her primary research interests lie in stochastic optimization and theoretical machine learning. She currently works on understanding and designing optimization methods in machine learning, specifically, stochastic gradient descent and its variants, adaptive gradient methods, and momentum methods.

Talk Abstract:

Stochastic gradient descent (SGD) has been a popular tool in training large-scale machine learning models. Yet, its performance is highly variable, depending heavily on the choice of the step sizes. This has motivated a variety of strategies for tuning the step size and research on adaptive step sizes. However, most of them lack a theoretical guarantee. In this talk, I will present a generalized AdaGrad method with adaptive step size and two heuristic step schedules for SGD: the exponential step size and cosine step size. For the first time, we provide theoretical support for them, deriving convergence guarantees and showing that these step sizes allow to automatically adapt to the level of noise of the stochastic gradients. I will also discuss their empirical performance and some related optimization methods.

Tweet this post! Post on LinkedIn
Multithreaded

Come Work with Us!

We’re a diverse team dedicated to building great products, and we’d love your help. Do you want to build amazing products with amazing peers? Join us!