ATCS – Selected Topics in Learning, Prediction, and Optimization (with applications in Finance)
2024 Spring
Lecturer: Jian Li ( lapordge at gmail dot com)
TA: Zeren Tan (tanzr20@mails.tsinghua.edu.cn)
time: every Monday 9:50am12:15am
Room: 四教4201
We intend to cover a subset of the following topics (tentative):
(1) I assume you already know all basics (convex optimization and machine learning, stochastic gradient descent, gradient boosting, deep learning basics, CNN, RNN, please see my undergrad course). If you don't know much machine learning (e.g., you do not know how to derive the dual of SVM yet), please do NOT take this course. I will recall some concepts briefly when necessary.(1) statistical learning theory (2) theory of deep learning (3) I will talk about some (new) topics in ML: diffusion, LLM, robustness, explainable AI, fairness, calibration
I won't stickly follow the above order....I may skip something mentioned above and cover something not mentioned above...It is a graduate course.
I will be talking about several applications of ML and optimization in Finance (trading, pricing derivatives etc), and of course in typical CS areas like vision, nlp, social networks as well...
I will teach about 2/33/4 of the classes. For the rest, I will choose some topics and students need to do class presentation.
Tentative topics for class presentation: generative models (GAN), adversarial learning and robustness, unsupervised learning (cotraining, pseudolabeling, contrastive learning), metalearning, AutoML, various financial applications.
Basic machine learning knowledge is a must. Andrew Ng's undergrad lecture notes
The course may use various math tools from convex optimization, spectral theory, matrix pertubation, probability, high dimensional geometry, functional analysis, fourier analysis, real algebra geometry, stochastic differential geometry, information theory and so on. Only standard CS undergrad math and machine learning knowledge are required, otherwise the course will be selfcontained. But certain math maturity is required.
Some knowledge about convex optimization may be useful. See this course (by S. Boyd) and a previous course by myself. But it will be fine if you didn't take those courses.
The course is a blending of theory and practice. We will cover both the underlying mathematics as well as interesting heuristics.
Grading:
Schedule:
Feb 26  Introduction of the course Gaussian Process Basics of Brownian Motion Stochastic differential equation (SDE) Diffusion process 
optional reading:
Stochastic Calculus, Filtering, and Stochastic Control (an excellent introductory book for SDE) 
scribed notes 
Mar 4  Ito Integral, Ito Process, Ito's Lemma, FeymannKac, FokkerPlanck, Intro to generative diffusion process 
optional reading:
Stochastic Calculus, Filtering, and Stochastic Control (an excellent introductory book for SDE), ScoreBased Generative Modeling through Stochastic Differential Equations 
scribed notes 
Mar 11  Diffusion Process Scorebased Generative Diffusion Models SMLD,DDPM, Probability Flow ODE Variational Perspective of Diffusion process DDIM 
ScoreBased Generative Modeling through Stochastic Differential
Equations Denoising Diffusion Probabilistic Models Denoising Diffusion Implicit Models 
scribed notes 
Mar 18  DPMSolver VQGAN Latent Diffusion Models (Stable Diffusion), ControlNet Consistency Models Latent Consistency Models (LCM), LCMLora 
DPMSolver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling
in Around 10 Steps
HighResolution Image Synthesis with Latent Diffusion Models Adding Conditional Control to TexttoImage Diffusion Models Consistency Models Latent Consistency Models Synthesizing HighResolution Images with Fewstep Inference LoRA: LowRank Adaptation of Large Language Models 
scribed notes 
Mar 25  Rectified Flow DiT (diffusion transformer) ViT (vision transformer) Flow Matching Stable Diffusion 3 SORA Discussion 
Flow Straight and Fast: Learning to Generate and Transfer Data with
Rectified Flow Flow Matching for Generative Modeling Scalable diffusion models with transformers https://stability.ai/news/stablediffusion3researchpaper https://openai.com/sora https://github.com/hpcaitech/OpenSora 
scribed notes 
Apr 1  Quick review of classical
statistical learning theory, Symmetrization, Chaining, Covering number, VCdimension 
We follow the exposition from the book [Book] Probability in High Dimension  scribed notes 
Apr 8  Pseudodimension, Fatshattering
dimension, Margin Theory, Intro to deep learning theory 
Foundation of Machine
Learning. Sec 4.4 Margin theory. understand deep learning requires rethinking generalization Spectrallynormalized margin bounds for neural networks Stronger Generalization Bounds for Deep Nets via a Compression Approach uniform convergence may be unable to explain generalization in deep learning 
scribed notes 
Apr 15  Algorithmic Stability Generalization of SGD (convex setting) Generalization of SGLD (nonconvex) Uniform convergence in deep learning Generalization measure 
uniform convergence may be unable to explain generalization in deep
learning Train faster, generalize better: Stability of stochastic gradient descent On generalization error bounds of noisy gradient methods for nonconvex learning fantastic generalization measures and where to find them 

Apr 22  PACBayesian framework A nonvacuous generalization bound based on PACbayesian Generalization of SGLD (with l2 regularization) Quick intro of convergence of Markov process (Poincare inequality, Logsobolev inequality Kaiyue presented his work on SAM (this paper and this paper) 
Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. generalization bounds of sgld for nonconvex learning: two theoretical viewpoints generalization bounds for gradient methods via discrete and continuous prior SharpnessAware Minimization (SAM) for Efficiently Improving Generalization Logarithmic Sobolev Inequalities Essentials 

Apr 29  Adverserial Robustness: Shortcut in Learning Adverserial examples Attack: FGSM, PGD Defense: Data Augmentation, Adverserial training (AT), Diffusion based Certified robustness Randomized Smoothing (NeymanPearson Lemma) Robustness of MLLM Dimpled manifold model 
Shortcut learning in deep neural networks Explaining and harnessing adversarial examples Certified adversarial robustness via randomized smoothing (CERTIFIED!!) ADVERSARIAL ROBUSTNESS FOR FREE! Adversarial purification with scorebased generative models
Adversarial Robustness Benchmark


Mar 6  Adverserial Robustness: Robust features and nonrobust features Dimpled manifold model Implicit Bias in Deep Learning: Margin Maximization, Simplicity Bias: Simple classification boundaries, Low rank solutions, Low frequency solutions, Early phase of GD: like a linear model, Feature Averaging (lead to nonrobust solutions), Sharpness Minimization 
Adversarial examples are not bugs, they are features The Dimpled Manifold Model of Adversarial Examples in Machine Learning Gradient Descent Maximizes the Margin of Homogeneous Neural Networks Gradient Descent on Twolayer Nets: Margin Maximization and Simplicity Bias The Pitfalls of Simplicity Bias in Neural Networks On the Spectral Bias of Neural Networks Fourier Analysis Sheds Light on Deep Neural Networks The Surprising Simplicity of the EarlyTime Learning Dynamics of Neural Networks 

Mar 13  Reproducing Kernel Hilbert Space RKHS Max Mean Discrepancy (MMD) 
A Primer on Reproducing Kernel Hilbert Spaces A Kernel TwoSample Test 

Mar 20  Guest lecture Dingli Yu (Tensor program) Kaifeng Lyu (Grokking) 
Tensor Programs VI: Feature Learning in
InfiniteDepth Neural Networks


Mar 27  Gaussian process and Kernel
Regression Random Fourier Features Kernel Generalization Bounds Neural Tanget Kernel 
A Primer on Reproducing Kernel Hilbert Spaces Universality, characteristic kernels and RKHS embedding of measures Gaussian Processes and Kernel Methods: A Review on Connections and Equivalences Rademacher and Gaussian complexities: Risk bounds and structural results Learning Kernels Using Local Rademacher Complexity FineGrained Analysis of Optimization and Generalization for Overparameterized TwoLayer Neural Networks 

Jun 3  SelfContrastive learning: Word2Vec, Deepwalk, relation between selfcontrastive learning and spectral clustering 
References:
[Book] Introduction to online convex optimization
[Book] Learning, Prediction and Games
[Book] Options, Futures and Other Derivatives
[Book] Advances in Financial Machine Learning
[Book] Convex Optimization
[Book] Foundation of Machine Learning
[Book] Understanding Machine Learning: From Theory to Algorithms
Python is the default programming language we will use in the course.
If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially
for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.
A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)
Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).