ATCS – Selected Topics in Learning, Prediction, and Optimization (with applications in Finance)

2023 Spring

Lecturer: Jian Li ( lapordge at gmail dot com)

TA: Zeren Tan

time: every Monday 9:50am-12:15am

Room: 四教 4306


We intend to cover a subset of the following topics (tentative):

(1) I assume you already know all basics (convex optimization and machine learning, stochastic gradient descent, gradient boosting, deep learning basics, CNN, RNN, please see my undergrad course). If you don't know much machine learning (e.g., you do not know how to derive the dual of SVM yet), please do NOT take this course. I will recall some concepts briefly when necessary.(1) statistical learning theory (2) theory of deep learning (3) I will talk about some (new) topics in ML: diffusion, robustness, explainable AI, fairness, calibration

I won't stickly follow the above order....I may skip something mentioned above and cover something not mentioned above...It is a graduate course.

I will be talking about several applications of ML and optimization in Finance (trading, pricing derivatives etc), and of course in typical CS areas like vision, nlp, social networks as well...

I will teach about 2/3-3/4 of the classes. For the rest, I will choose some topics and students need to do class presentation.

Tentative topics for class presentation: generative models (GAN), adversarial learning and robustness, unsupervised learning (co-training, pseudolabeling, contrastive learning), meta-learning, AutoML, various financial applications.

Basic machine learning knowledge is a must. Andrew Ng's undergrad lecture notes

The course may use various math tools from convex optimization, spectral theory, matrix pertubation, probability, high dimensional geometry, functional analysis, fourier analysis, real algebra geometry, stochastic differential geometry, information theory and so on. Only standard CS undergrad math and machine learning knowledge are required, otherwise the course will be self-contained. But certain math maturity is required.

Some knowledge about convex optimization may be useful. See this course (by S. Boyd) and a previous course by myself. But it will be fine if you didn't take those courses.

The course is a blending of theory and practice. We will cover both the underlying mathematics as well as interesting heuristics. 

 


Grading:

  1. Homeworks (30pts, 1 homework every two or three weeks)
  2. 10 pts for taking notes: Each student should take notes for at least one lecture (maybe two), using LaTex (use this template sample.tex algorithm2e.sty).
  3. Course projects (60 pts. 5pts for mid-report, 15pts for final presentation, 40pts for final report)
  4. No exam. 

 


Schedule:

 

Feb 20 Gaussian Process

Basics of Brownian Motion

Stochastic differential equation (SDE)

Diffusion process

Ito Integral, Ito Process, Ito's Lemma
optional reading:

Stochastic Calculus, Filtering, and Stochastic Control (an excellent introductory book for SDE)
scribed notes 
Feb 27
Langevin Dynamics, OU process

PDE and SDE: Feymann-Kac formula, Fokker Planck equation.

Reverse time diffusion equation (Anderson's theorem)

Score-based generative models
(SMLD, DDPM)
optional reading:
Stochastic Calculus, Filtering, and Stochastic Control (an excellent introductory book for SDE)

The Fokker-Planck equation

REVERSE-TIME DIFFUSION EQUATION MODELS  by ANDERSON

SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS
scribed notes 
Mar 6 Probabilistic flow ODE

Stable diffusion

ControlNet

Radon–Nikodym derivative
optional reading:

SCORE-BASED GENERATIVE MODELING THROUGH STOCHASTIC DIFFERENTIAL EQUATIONS

High-Resolution Image Synthesis with Latent Diffusion Models



Adding Conditional Control to Text-to-Image Diffusion Models

Slides

scribed notes 
Mar 13 (guest lecture by Qiang Liu)

Optimal transport, Rectified flow, diffusion bridges

optional reading
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow

Rectified flow: A marginal preserving approach to optimal transport

Let us Build Bridges: Understanding and Extending Diffusion Generative Models
scribed notes 
Mar 20 Adversarial Robustness and Score-based generative model. Adversarial training, Adversarial Purification

Application: stable diffusion for recovering brain activity (from fMRI data)

Girsanov Theorem

Student presentation: (by Simian Luo): Poisson Flow Generative Models

optional reading:

Adversarial purification with score-based generative models

 

Diffusion Models for Adversarial Purification

 

High-resolution image reconstruction with latent diffusion models from human brain activity

Poisson Flow Generative Models

Girsanov Theorem: Section 4.5. Stochastic Calculus, Filtering, and Stochastic Control
Slides

scribed notes 
Mar 27 Markov Semigroup, Generator

Poincare inequality, Convergence of Markov Process

Theory for diffusion models

optional reading:

Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions

(poincare inequality and convergence of Markov process) The Fokker-Planck equation

Convergence of (discrete) Markov Chain (i.e., random walk on graphs)
Slides

scribed notes 
Apr 3 generalization, uniform convergence

maxima of Gaussian process, Chaining, covering number, Dudley integral

Empirical process: symmetrization, Rademacher complexity, VC-dimension 

optional reading:

We follow the exposition from the book [Book] Probability in High Dimension

scribed notes 
Apr 10 Fat shattering dimension, Margin-based generalization

rethinking generalization

double descent 

optional reading:

Foundation of Machine Learning. Sec 4.4 Margin theory.

Spectrally-normalized margin bounds for neural networks


understand deep learning requires rethinking generalization

Reconciling modern machine learning practice and the bias-variance trade-off

scribed notes 
Apr 17 A bit Random matrix theory

Double descent in linear model

Grokking

Failure of uniform convergence 

optional reading:

uniform convergence may be unable to explain generalization in deep learning

[book] random matrix for machine learning

[lecture note] Double Descent in Linear Models

Model, sample, and epoch-wise descents: exact solution of gradient flow in the random feature model

Unifying Grokking and Double Descent
scribed notes 
Apr 18 Generalization by Algorithmic stability

PAC-Bayesian bounds 

optional reading:

Foundation of Machine Learning. Sec 11. Algorithmic stability

Train faster, generalize better: Stability of stochastic gradient descent

Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints

On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning


Understanding Machine Learning: From Theory to Algorithms Ch 31. Pac-Bayes.

scribed notes 

 

 


References:

[Book] Introduction to online convex optimization

[Book] Learning, Prediction and Games

[Book] Options, Futures and Other Derivatives  

[Book] Advances in Financial Machine Learning

[Book] Convex Optimization

[Book] Foundation of Machine Learning

[Book] Probability in High Dimension

[Book] Understanding Machine Learning: From Theory to Algorithms

Lecture notes for STAT928: Statistical Learning and Sequential Prediction 


Python is the default programming language we will use in the course.

If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially

for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.

A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)

Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).