EC525: Optimization for Machine Learning (Fall 2023)
Efficient algorithms to train large models on large datasets have been critical to the recent successes
in machine learning and deep learning. This course will introduce students to both the theoretical principles
behind such algorithms as well as practical implementation considerations.
Topics include convergence properties of first-order optimization techniques such as
stochastic gradient descent, adaptive learning rate schemes, and momentum.
Particular focus will be given to the stochastic optimization problems with non-convex loss surfaces
typically present in modern deep learning problems.
Syllabus with meeting time and other logistical information (BU login required)
Topics
- Stochastic Gradient Descent
- Momentum-based optimization, and accelerated gradient descent.
- Adaptive gradient methods, including AdaGrad and Adam.
- Normalized stochastic gradient descent, LARS and LAMB.
- Large batch size optimization.
- Stochastic preconditioning.
- Memory-efficiency techniques.
- Learning rate scheduling.
- Hyperparameter tuning.
- Second-order optimization and hessian-vector products.
- Variance reduction.
Prerequisites
Ability to program in Python. Comfort with linear algebra, calculus, and probability.
Example concepts that should be familiar include gradients, eigenvectors, eigenvalues,
Taylor series, and expectations. The class will require writing rigorous mathematical
proofs.
Course notes will be updated as the course progresses. You may need to manually refresh to clear cached versions of the notes.
This course involves a lot of proofs, both in lecture and on the homework.
If you're not confident about proof-writing, these
exercises may provide some practice.
These are "general" abstract math problems are not particularly related to the content of the course other than requiring proof writing.