ATCS – Selected Topics in Learning, Prediction, and Optimization (with applications in Finance)

2024 Spring

Lecturer: Jian Li ( lapordge at gmail dot com)

TA: Zeren Tan (tanzr20@mails.tsinghua.edu.cn)

time: every Monday 9:50am-12:15am

Room: 四教4201


We intend to cover a subset of the following topics (tentative):

(1) I assume you already know all basics (convex optimization and machine learning, stochastic gradient descent, gradient boosting, deep learning basics, CNN, RNN, please see my undergrad course). If you don't know much machine learning (e.g., you do not know how to derive the dual of SVM yet), please do NOT take this course. I will recall some concepts briefly when necessary.(1) statistical learning theory (2) theory of deep learning (3) I will talk about some (new) topics in ML: diffusion, LLM, robustness, explainable AI, fairness, calibration

I won't stickly follow the above order....I may skip something mentioned above and cover something not mentioned above...It is a graduate course.

I will be talking about several applications of ML and optimization in Finance (trading, pricing derivatives etc), and of course in typical CS areas like vision, nlp, social networks as well...

I will teach about 2/3-3/4 of the classes. For the rest, I will choose some topics and students need to do class presentation.

Tentative topics for class presentation: generative models (GAN), adversarial learning and robustness, unsupervised learning (co-training, pseudolabeling, contrastive learning), meta-learning, AutoML, various financial applications.

Basic machine learning knowledge is a must. Andrew Ng's undergrad lecture notes

The course may use various math tools from convex optimization, spectral theory, matrix pertubation, probability, high dimensional geometry, functional analysis, fourier analysis, real algebra geometry, stochastic differential geometry, information theory and so on. Only standard CS undergrad math and machine learning knowledge are required, otherwise the course will be self-contained. But certain math maturity is required.

Some knowledge about convex optimization may be useful. See this course (by S. Boyd) and a previous course by myself. But it will be fine if you didn't take those courses.

The course is a blending of theory and practice. We will cover both the underlying mathematics as well as interesting heuristics. 

 


Grading:

  1. Homeworks (20pts, 3-4 homeworks)
  2. 10 pts for taking notes: Each student should take notes for at least one lecture (maybe two), using LaTex (use this template sample.tex algorithm2e.sty).
  3. Class participation/class presentation (10pts)
  4. Course projects (60 pts. 10pts for mid-report, 10pts for final presentation, 40pts for final report)
  5. No close-book exam. 

 


Schedule:

 

Feb 26 Introduction of the course

Gaussian Process
Basics of Brownian Motion
Stochastic differential equation (SDE)
Diffusion process
optional reading:
Stochastic Calculus, Filtering, and Stochastic Control (an excellent introductory book for SDE)
scribed notes
Mar 4 Ito Integral, Ito Process, Ito's Lemma, Feymann-Kac, Fokker-Planck, Intro to generative diffusion process optional reading:
Stochastic Calculus, Filtering, and Stochastic Control (an excellent introductory book for SDE),
Score-Based Generative Modeling through Stochastic Differential Equations
scribed notes
Mar 11 Diffusion Process
Score-based Generative Diffusion Models SMLD,DDPM, Probability Flow ODE Variational Perspective of Diffusion process DDIM  
Score-Based Generative Modeling through Stochastic Differential Equations
Denoising Diffusion Probabilistic Models
Denoising Diffusion Implicit Models
scribed notes
Mar 18 DPM-Solver
VQ-GAN
Latent Diffusion Models (Stable Diffusion), ControlNet
Consistency Models
Latent Consistency Models (LCM), LCM-Lora
DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
High-Resolution Image Synthesis with Latent Diffusion Models
Adding Conditional Control to Text-to-Image Diffusion Models
Consistency Models
Latent Consistency Models Synthesizing High-Resolution Images with Few-step Inference
LoRA: Low-Rank Adaptation of Large Language Models
scribed notes
Mar 25  Rectified Flow
DiT (diffusion transformer)
ViT (vision transformer)
Flow Matching
Stable Diffusion 3
SORA
Discussion
Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow
Flow Matching for Generative Modeling
Scalable diffusion models with transformers
https://stability.ai/news/stable-diffusion-3-research-paper
https://openai.com/sora
https://github.com/hpcaitech/Open-Sora
scribed notes
Apr 1 Quick review of classical statistical learning theory,
Symmetrization, Chaining, Covering number, VC-dimension
We follow the exposition from the book [Book] Probability in High Dimension scribed notes
Apr 8 Pseudo-dimension, Fat-shattering dimension,
Margin Theory,
Intro to deep learning theory
Foundation of Machine Learning. Sec 4.4 Margin theory.
understand deep learning requires rethinking generalization
Spectrally-normalized margin bounds for neural networks

Stronger Generalization Bounds for Deep Nets via a Compression Approach
uniform convergence may be unable to explain generalization in deep learning
scribed notes
Apr 15 Algorithmic Stability
Generalization of SGD (convex setting)
Generalization of SGLD (nonconvex)
Uniform convergence in deep learning
Generalization measure
uniform convergence may be unable to explain generalization in deep learning
Train faster, generalize better: Stability of stochastic gradient descent
On generalization error bounds of noisy gradient methods for non-convex learning
fantastic generalization measures and where to find them
 
Apr 22 PAC-Bayesian framework
A nonvacuous generalization bound based on PAC-bayesian
Generalization of SGLD (with l2 regularization)
Quick intro of convergence of Markov process (Poincare inequality, Log-sobolev inequality
Kaiyue presented his work on SAM (this paper and this paper)
Computing nonvacuous generalization bounds for
deep (stochastic) neural networks with many more parameters than training data.
generalization bounds of sgld for non-convex learning: two theoretical viewpoints
generalization bounds for gradient methods via discrete and continuous prior
Sharpness-Aware Minimization (SAM) for Efficiently Improving Generalization
Logarithmic Sobolev Inequalities Essentials
 
Apr 29 Adverserial Robustness:
Shortcut in Learning
Adverserial examples
Attack: FGSM, PGD
Defense: Data Augmentation, Adverserial training (AT), Diffusion based
Certified robustness
Randomized Smoothing (Neyman-Pearson Lemma)
Robustness of MLLM
Dimpled manifold model
Shortcut learning in deep neural networks
Explaining and harnessing adversarial examples
Certified adversarial robustness via randomized smoothing
(CERTIFIED!!) ADVERSARIAL ROBUSTNESS FOR FREE!
Adversarial purification with score-based generative models

Adversarial Robustness Benchmark
How Robust is Google's Bard to Adversarial Image Attacks?
The Dimpled Manifold Model of Adversarial Examples in Machine Learning

 
Mar 6 Adverserial Robustness:
Robust features and nonrobust features
Dimpled manifold model

Implicit Bias in Deep Learning:
Margin Maximization, Simplicity Bias: Simple classification boundaries, Low rank solutions, Low frequency solutions, Early phase of GD: like a linear model, Feature Averaging (lead to nonrobust solutions), Sharpness Minimization
Adversarial examples are not bugs, they are features
The Dimpled Manifold Model of Adversarial Examples in Machine Learning
Gradient Descent Maximizes the Margin of Homogeneous Neural Networks
Gradient Descent on Two-layer Nets: Margin Maximization and Simplicity Bias
The Pitfalls of Simplicity Bias in Neural Networks
On the Spectral Bias of Neural Networks
Fourier Analysis Sheds Light on Deep Neural Networks

The Surprising Simplicity of the Early-Time Learning Dynamics of Neural Networks

 
Mar 13 Reproducing Kernel Hilbert Space
RKHS
Max Mean Discrepancy (MMD)
A Primer on Reproducing Kernel Hilbert Spaces
A Kernel Two-Sample Test
 
Mar 20
Guest lecture
Dingli Yu (Tensor program)
Kaifeng Lyu (Grokking)
 

Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

 
Mar 27 Gaussian process and Kernel Regression
Random Fourier Features
Kernel Generalization Bounds
Neural Tanget Kernel
A Primer on Reproducing Kernel Hilbert Spaces
Universality, characteristic kernels and RKHS embedding of measures
Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences
Rademacher and Gaussian complexities: Risk bounds and structural results
Learning Kernels Using Local Rademacher Complexity
Fine-Grained Analysis of Optimization and Generalization for Overparameterized Two-Layer Neural Networks
 
Jun 3 Self-Contrastive learning: Word2Vec, Deepwalk, relation between self-contrastive learning and spectral clustering    

 

 


References:

[Book] Introduction to online convex optimization

[Book] Learning, Prediction and Games

[Book] Options, Futures and Other Derivatives  

[Book] Advances in Financial Machine Learning

[Book] Convex Optimization

[Book] Foundation of Machine Learning

[Book] Probability in High Dimension

[Book] Understanding Machine Learning: From Theory to Algorithms

Lecture notes for STAT928: Statistical Learning and Sequential Prediction 


Python is the default programming language we will use in the course.

If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially

for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.

A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)

Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).