ATCS – Selected Topics in Learning, Prediction, and Optimization (with applications in Finance)

2022 Spring

Lecturer: Jian Li ( lapordge at gmail dot com)

TA:

time: every Monday 9:50am-12:15am

Room: 四教 4403


We intend to cover a subset of the following topics (tentative):

(1) I assume you already know all basics (convex optimization and machine learning, stochastic gradient descent, gradient boosting, deep learning basics, CNN, RNN, please see my undergrad course). If you don't know much machine learning (e.g., you do not know how to derive the dual of SVM yet), please do NOT take this course. I will recall some concepts briefly when necessary.(2) online learning/optimization, multi-armed bandit, statistical learning theory, theory of deep learning(3) I will talk about some (new) notions in ML: robustness, explainable AI, fairness, calibration

I won't stickly follow the above order....I may skip something mentioned above and cover something not mentioned above...It is a graduate course.

I will be talking about several applications of ML and optimization in Finance (trading, pricing derivatives etc), and of course in typical CS areas like vision, nlp, social networks as well...

I will teach about 2/3-3/4 of the classes. For the rest, I will choose some topics and students need to do class presentation.

Tentative topics for class presentation: generative models (GAN), adversarial learning and robustness, unsupervised learning (co-training, pseudolabeling, contrastive learning), meta-learning, AutoML, various financial applications.

Basic machine learning knowledge is a must. Andrew Ng's undergrad lecture notes

The course may use various math tools from convex optimization, spectral theory, matrix pertubation, probability, high dimensional geometry, functional analysis, fourier analysis, real algebra geometry, stochastic differential geometry, information theory and so on. Only standard CS undergrad math and machine learning knowledge are required, otherwise the course will be self-contained. But certain math maturity is required.

Some knowledge about convex optimization may be useful. See this course (by S. Boyd) and a previous course by myself. But it will be fine if you didn't take those courses.

The course is a blending of theory and practice. We will cover both the underlying mathematics as well as interesting heuristics. 

 


Grading:

  1. Homeworks (35pts, 1 homework every two or three weeks)
  2. 5 pts for taking notes: Each student should take notes for at least one lecture (maybe two), using LaTex (use this template sample.tex algorithm2e.sty).
  3. 5 pts for student presentation: The last 3-4 classes will be student presentations. Each student needs to present for around 30 min. I will provide some candidate topics/papers.
  4. Course projects (55 pts. 5pts for mid-report, 15pts for final presentation, 35pts for final report)
  5. No exam. 

 


Schedule:

 

1.Feb 21
Basics of Convex optimization
Strongly Convexity, Smoothness
Convergence analysis of Gradient Descent
Basics of Online learning, regret
  1. [Book] Convex Optimization, Chapter 9.1-9.3
  2. Basic concepts: norms
  3. Basic concept:online learning, regret

 

2. Feb 28 Online convex optimization
Online gradient descent
Expert Problem
Conjugate gradient method
Online convex programming and generalized infinitesimal gradient ascent. M. Zinkevich
Conjugate gradient:
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain (a lot of pictures that help you to think about the geometry)
Our lecture follows from the succinct exposition in

Faster Algorithms via Approximation Theory

 

The optimal rate can also be achieved by momentum method for general strongly convex functions
A very nice blog post why momentum works (with animations!)
See https://web.stanford.edu/class/msande318/notes/notes-first-order-smooth.pdf
3. Mar.7
Zero-sum game, von Neumann's Theorem
Blackwell approachibility
Internal regret.
Universal Portfolio
Regret in the On-Line Decision Problem

Universal Portfolios With and Without Transaction Costs

4. Mar 14
Blackwell approachibility is equivalent to online linear optimizaiton
Calibration.
Adversarial bandit model
EXP 3
Blackwell Approachability and No-Regret Learning are Equivalent

Regret in the On-Line Decision Problem

For EXP3, we use the analysis from this
lecture note (by Haipeng Luo)
5. Mar 21
Multi-armed bandit model.
UCB (upper confidence bound)
Median elimination
epsilon-greedy
Contextual bandit model
EXP 4, Langford-Zhang's epsilon-greedy
Online-to-Batch
Finite-time Analysis of the Multiarmed Bandit Problem

Action Elimination and Stopping Conditions for the Multi-Armed Bandit and Reinforcement Learning Problems

Our lecture for contextual bandits follows the exposition in this
lecture note (by Haipeng Luo)

Optional: Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits
6. Mar 28 Maxima of subgaussian process, Chaining
Empirical Process
Symmetrization
VC-dimension
[Book] Probability in High Dimension, Ch 5.1-5.3, Ch 7.1-7.2

Optional reading: Talagrand's generic chaining Ch. 6
7. Apr 2 Pseudo-dimension
Fat-Shattering Dimension
Rademacher complexity

Scale-Sensitive Dimensions, Uniform Convergence, and Learnability

See also
[Book] Probability in High Dimension,  Ch 7.3
8. Apr 11 Margin Theory

Algorithmic Stability, Uniform stability and generalization

uniform stability of SGD in convex case (beta-smooth case)
[Book] Foundation of Machine Learning, Ch. 4

Train faster, generalize better: Stability of stochastic gradient descent
9. Apr 18 Uniform stability of SGD in strongly convex case
Uniform stability of SGD in nonconvex case
Random label experiment
Uniform stability of Gradient Langevin Dynamic (GLD)
Convex Learnability
Linear functions are not learnable without assumptions
On-average-replace-one stability 
Regularized Empirical risk minimization for convex learning (beta-smooth case)
Train faster, generalize better: Stability of stochastic gradient descent

Understanding deep learning requires rethinking generalization



On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning

[Book] Understanding Machine Learning: From Theory to Algorithms Ch 12, 13

Optional reading:

[Book] Understanding Machine Learning: From Theory to Algorithms Ch 14.5

Generalization Bounds of SGLD for Non-convex Learning

10. Apr 25 PAC-Bayesian Theory

Compression-based generalization

Minimum Description Length Principle
The version of PAC-Bayes theorem we proved in class is from  Understanding Machine Learning: From Theory to Algorithms [Chapter 31]

Exploring Generalization in Deep Learning

A PAC-Bayesian Approach to Spectrally-Normalized Margin Bounds for Neural Networks

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

Stronger Generalization Bounds for Deep Nets via a Compression Approach

Optional reading:

A good introduction to Kolmogorov complexity can be found in Cover's Elements of Information Theory  book

A Tutorial Introduction to the Minimum Description Length
11. May 9 Brownian Motion, Stochastic differential equation (SDE), Ito Process, Ito's Lemma

Langevin Dynamics, OU process

PDE and SDE, Feymann-Kac formula

optional reading:
The Fokker-Planck equation
Stochastic Calculus, Filtering, and Stochastic Control
12. May 16 Adjoint operator
Fokker-Planck equation
Optimization and Generalization of Langevin processes
SGD and Flat minimum
Generalization Bounds of SGLD for Non-convex Learning

optional reading:
The Fokker-Planck equation
13. May 23
Graph Laplacian
Cheeger's inequality, Conductance.
Spectral clustering
Contrastive loss as matrix factorization

On Spectral Clustering: Analysis and an algorithm 

Partitioning Well-Clustered Graphs: Spectral Clustering Works

provable guarantees for self-supervised deep learning with spectral contrastive loss

14. May 30 Topic modeling, Nonnegative matrix factorization, Anchor word
Word2Vec, negative sampling/contrastive loss
Word Embedding as implicit matrix factorization

Domain adaptation

Fairness (multi)-Calibration

Differential privacy, generalization, adaptive data analysis(very brief)

Learning topic models – going beyond svd
Neural Word Embedding as Implicit Matrix Factorization

A theory of learning from different domains



On Calibration of Modern Neural Networks
Multicalibration: Calibration for the (Computationally-Identifiable) Masses

optional reading:
Advances in Domain Adaptation Theory
Outcome indistinguishability 
15. Jun 6  Final project presentation  

 

 


References:

[Book] Introduction to online convex optimization

[Book] Learning, Prediction and Games

[Book] Options, Futures and Other Derivatives  

[Book] Advances in Financial Machine Learning

[Book] Convex Optimization

[Book] Foundation of Machine Learning

[Book] Probability in High Dimension

[Book] Understanding Machine Learning: From Theory to Algorithms

Lecture notes for STAT928: Statistical Learning and Sequential Prediction 


Python is the default programming language we will use in the course.

If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially

for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.

A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)

Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).