EC500: Optimization for Machine Learning

EC500: Optimization for Machine Learning (Spring 2021)

Efficient algorithms to train large models on large datasets have been critical to the recent successes in machine learning and deep learning. This course will introduce students to both the theoretical principles behind such algorithms as well as practical implementation considerations. Topics include convergence properties of first-order optimization techniques such as stochastic gradient descent, adaptive learning rate schemes, and momentum. Particular focus will be given to the stochastic optimization problems with non-convex loss surfaces typically present in modern deep learning problems.

Syllabus with meeting time and zoom link (BU login required)

Topics

Stochastic Gradient Descent
Momentum-based optimization, and accelerated gradient descent.
Adaptive gradient methods, including AdaGrad and Adam.
Normalized stochastic gradient descent, LARS and LAMB.
Large batch size optimization.
Stochastic preconditioning.
Memory-efficiency techniques.
Learning rate scheduling.
Hyperparameter tuning.
Second-order optimization and hessian-vector products.
Variance reduction.

Prerequisites

Ability to program in Python. Some experience with linear algebra, calculus, and probability. Example concepts that should be familiar include gradients, eigenvectors, eigenvalues, Taylor series, and expectations.