ATCS – Selected Topics in Learning, Prediction, and Optimization (with applications in Finance)
2022 Spring
Lecturer: Jian Li ( lapordge at gmail dot com)
TA:
time: every Monday 9:50am12:15am
Room: 四教 4403
We intend to cover a subset of the following topics (tentative):
(1) I assume you already know all basics (convex optimization and machine learning, stochastic gradient descent, gradient boosting, deep learning basics, CNN, RNN, please see my undergrad course). If you don't know much machine learning (e.g., you do not know how to derive the dual of SVM yet), please do NOT take this course. I will recall some concepts briefly when necessary.(2) online learning/optimization, multiarmed bandit, statistical learning theory, theory of deep learning(3) I will talk about some (new) notions in ML: robustness, explainable AI, fairness, calibration
I won't stickly follow the above order....I may skip something mentioned above and cover something not mentioned above...It is a graduate course.
I will be talking about several applications of ML and optimization in Finance (trading, pricing derivatives etc), and of course in typical CS areas like vision, nlp, social networks as well...
I will teach about 2/33/4 of the classes. For the rest, I will choose some topics and students need to do class presentation.
Tentative topics for class presentation: generative models (GAN), adversarial learning and robustness, unsupervised learning (cotraining, pseudolabeling, contrastive learning), metalearning, AutoML, various financial applications.
Basic machine learning knowledge is a must. Andrew Ng's undergrad lecture notes
The course may use various math tools from convex optimization, spectral theory, matrix pertubation, probability, high dimensional geometry, functional analysis, fourier analysis, real algebra geometry, stochastic differential geometry, information theory and so on. Only standard CS undergrad math and machine learning knowledge are required, otherwise the course will be selfcontained. But certain math maturity is required.
Some knowledge about convex optimization may be useful. See this course (by S. Boyd) and a previous course by myself. But it will be fine if you didn't take those courses.
The course is a blending of theory and practice. We will cover both the underlying mathematics as well as interesting heuristics.
Grading:
Schedule:
1.Feb 21 
Basics of Convex optimization Strongly Convexity, Smoothness Convergence analysis of Gradient Descent Basics of Online learning, regret 

2. Feb 28  Online convex optimization Online gradient descent Expert Problem Conjugate gradient method 
Online convex
programming and generalized infinitesimal gradient ascent. M. Zinkevich Conjugate gradient: An Introduction to the Conjugate Gradient Method Without the Agonizing Pain (a lot of pictures that help you to think about the geometry) Our lecture follows from the succinct exposition in Faster Algorithms via Approximation TheoryThe optimal rate can also be achieved by momentum method for general strongly convex functions A very nice blog post why momentum works (with animations!) See https://web.stanford.edu/class/msande318/notes/notesfirstordersmooth.pdf 
3. Mar.7  Zerosum game, von Neumann's Theorem Blackwell approachibility Internal regret. Universal Portfolio 
Regret in the
OnLine Decision ProblemUniversal Portfolios With and Without Transaction Costs 
4. Mar 14  Blackwell approachibility is equivalent to online linear optimizaiton Calibration. Adversarial bandit model EXP 3 
Blackwell Approachability and NoRegret Learning are Equivalent Regret in the OnLine Decision Problem For EXP3, we use the analysis from this lecture note (by Haipeng Luo) 
5. Mar 21  Multiarmed bandit model. UCB (upper confidence bound) Median elimination epsilongreedy Contextual bandit model EXP 4, LangfordZhang's epsilongreedy OnlinetoBatch 
Finitetime Analysis of the Multiarmed Bandit
Problem Action Elimination and Stopping Conditions for the MultiArmed Bandit and Reinforcement Learning Problems Our lecture for contextual bandits follows the exposition in this lecture note (by Haipeng Luo) Optional: Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits 
6. Mar 28  Maxima of subgaussian process,
Chaining Empirical Process Symmetrization VCdimension 
[Book] Probability in High Dimension, Ch 5.15.3, Ch
7.17.2 Optional reading: Talagrand's generic chaining Ch. 6 
7. Apr 2  Pseudodimension FatShattering Dimension Rademacher complexity 
ScaleSensitive Dimensions, Uniform Convergence, and Learnability See also [Book] Probability in High Dimension, Ch 7.3 
8. Apr 11  Margin Theory Algorithmic Stability, Uniform stability and generalization uniform stability of SGD in convex case (betasmooth case) 
[Book] Foundation of Machine
Learning, Ch. 4 Train faster, generalize better: Stability of stochastic gradient descent 
9. Apr 18  Uniform stability of SGD in
strongly convex case Uniform stability of SGD in nonconvex case Random label experiment Uniform stability of Gradient Langevin Dynamic (GLD) Convex Learnability Linear functions are not learnable without assumptions Onaveragereplaceone stability Regularized Empirical risk minimization for convex learning (betasmooth case) 
Train faster, generalize better: Stability of stochastic gradient
descent Understanding deep learning requires rethinking generalizationOn Generalization Error Bounds of Noisy Gradient Methods for NonConvex Learning [Book] Understanding Machine Learning: From Theory to Algorithms Ch 12, 13 Optional reading: [Book] Understanding Machine Learning: From Theory to Algorithms Ch 14.5 Generalization Bounds of SGLD for Nonconvex Learning 
10. Apr 25  PACBayesian Theory Compressionbased generalization Minimum Description Length Principle 
The version of PACBayes theorem we proved in class
is from
Understanding Machine Learning: From Theory to
Algorithms [Chapter 31] Exploring Generalization in Deep Learning A PACBayesian Approach to SpectrallyNormalized Margin Bounds for Neural Networks Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data Stronger Generalization Bounds for Deep Nets via a Compression Approach Optional reading: A good introduction to Kolmogorov complexity can be found in Cover's Elements of Information Theory book A Tutorial Introduction to the Minimum Description Length 
11. May 9  Brownian Motion,
Stochastic differential equation (SDE), Ito Process, Ito's Lemma Langevin Dynamics, OU process PDE and SDE, FeymannKac formula 
optional reading: The FokkerPlanck equation Stochastic Calculus, Filtering, and Stochastic Control 
12. May 16  Adjoint
operator FokkerPlanck equation Optimization and Generalization of Langevin processes SGD and Flat minimum 
Generalization Bounds of SGLD for Nonconvex Learning
optional reading: The FokkerPlanck equation 
13. May 23  Graph Laplacian Cheeger's inequality, Conductance. Spectral clustering Contrastive loss as matrix factorization 
On Spectral Clustering: Analysis and an algorithmPartitioning WellClustered Graphs: Spectral Clustering Works provable guarantees for selfsupervised deep learning with spectral contrastive loss 
14. May 30  Topic modeling, Nonnegative
matrix factorization, Anchor word Word2Vec, negative sampling/contrastive loss Word Embedding as implicit matrix factorization Domain adaptation Fairness (multi)Calibration Differential privacy, generalization, adaptive data analysis(very brief) 
Learning topic models – going beyond svd Neural Word Embedding as Implicit Matrix Factorization A theory of learning from different domainsOn Calibration of Modern Neural Networks Multicalibration: Calibration for the (ComputationallyIdentifiable) Masses optional reading: Advances in Domain Adaptation Theory Outcome indistinguishability 
15. Jun 6  Final project presentation 
References:
[Book] Introduction to online convex optimization
[Book] Learning, Prediction and Games
[Book] Options, Futures and Other Derivatives
[Book] Advances in Financial Machine Learning
[Book] Convex Optimization
[Book] Foundation of Machine Learning
[Book] Understanding Machine Learning: From Theory to Algorithms
Python is the default programming language we will use in the course.
If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially
for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.
A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)
Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).