ATCS – Selected Topics in Learning, Prediction, and Optimization (with applications in Finance)

2021 Spring

Lecturer: Jian Li ( lapordge at gmail dot com)

TA: Yitong Duan, Liang Zeng

time: every Monday 9:50am-12:15am

Room: 四教4303

We intend to cover a subset of the following topics (tentative):

(1) I assume you already know all basics (convex optimization and machine learning, stochastic gradient descent, gradient boosting, deep learning basics, CNN, RNN, please see my undergrad course). If you don't know much machine learning (e.g., you do not know how to derive the dual of SVM yet), please do NOT take this course. I will recall some concepts briefly when necessary.

(2) online learning/optimization, multi-armed bandit, statistical learning theory, theory of deep learning

I won't stickly follow the above order....I may skip something mentioned above and cover something not mentioned above...It is a graduate course.

I will be talking about several applications of ML and optimization in Finance (trading, pricing derivatives etc), and of course in typical CS areas like vision, nlp, social networks as well...

I will teach about 2/3 of the classes. For the rest, I will choose some topics and students need to do class presentation.

Tentative topics for class presentation: generative models (GAN), adverserial learning and robustness, unsupervised learning (co-training, pseudolabeling, contrastive learning), meta-learning, AutoML, various financial applications.

Some knowledge about convex optimization may be useful. See this course (by S. Boyd) and a previous course by myself. But it will be fine if you didn't take those courses. Basic machine learning knowledge is a must.
Andrew Ng's undergrad lecture notes

The course is a blending of theory and practice. We will cover both the underlying mathematics as well as interesting heuristics. 



  1. Homeworks (35pts, 1 homework every two or three weeks. We will have 2 coding assignments for this semester)
  2. 5 pts for taking notes: Each student should take notes for at least one lecture (maybe two), using LaTex (use this template sample.tex algorithm2e.sty).
  3. 5pts for student presentation: The last 3-4 classes will be student presentations. Each student needs to present for around 30 min. I will provide some candidate topics/papers.
  4. Course projects (55 pts. 10pts for mid-report, 10pts for final presentation, 35pts for final report)
  5. No exam. 




1.Feb 22 Basics of Convex optimization
Strongly Convexity, Smoothness
Convergence analysis of Gradient Descent
Basics of Online learning, regret
Online gradient descent
  1. [Book] Convex Optimization, Chapter 9.1-9.3
  2. Online convex programming and generalized infinitesimal gradient ascent. M. Zinkevich


2.Mar 1  

The expert problem
Hedge Algorithm (or Multiplicative weights method)
Second order method: Squint
Small loss bound, Quantile bound, Stochastic Setting
Basics of Stock, future markets-1

  1. Multiplicative weights method: a meta-algorithm and its applications. (A survey) Sanjeev Arora, Elad Hazan, and Satyen Kale. [pdf] The method can be used to solve approximately the zero-sum game and linear program, and is also closely related to Adaboost.
  2. Our exposition of expert problem follows this note
  3. The Squint paper by Koolen and Van Erven
  4. Our exposition follows this note
3.Mar 8 Some concepts in convex optimization: Fenchel conjugate, Bregman divergence
Mirror Descent
Online Mirror Descent
Basics of Stock, future markets-2
  1. Online mirror descent (see this note or this(chapter 5))
  2. A proof of Pinsker's inequality (see this book) (note that this is equivalent to saying that negative entropy is strongly convex w.r.t. L1)
  3. Options, Futures and Other Derivatives Chapter 3.
4.Mar 15  Online Mirror Descent, Follow the Regularized Leader
Adversarial Bandit (EXP3)
Upper Confidence Bound
  1. Finite-time Analysis of the Multiarmed Bandit Problem

  2. The non-stochastic multi-armed bandit problem

  3. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

Selected Reading: OMD (this survey chapter 5)
5.Mar 22 Conjugate Prior
Thompson Sampling
Median Elimination
Contextual Bandit (EXP4)
eps-Greedy (Langford-Zhang)
  1. A Tutorial on Thompson Sampling

  2. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits

  3. Taming the monster: A fast and simple algorithm for contextual bandits

6.Mar 29 Gaussian process

Bayesian optimization

a sketch of regret analysis of GP-UCB

Brownian Motion, Ito integral

  1. Practical Bayesian Optimization of Machine Learning Algorithms
  2. Gaussian process optimization in the bandit setting: No regret and experimental design

Selected reading:

[Book] Gaussian process for machine learning

7.Apr 5 Holiday, no class  
8.Apr 12 Ito's caculus.

Introduction to options

Binomial tree, Delta-hedge


Corresponding captures in [Book] Options, Futures and Other Derivatives  

Optional reading:

A rigorous and through treatment of SDE (including construction of Brownian motion):

[Book]Stochastic Calculus, Filtering, and Stochastic Control


9.Apr 19 BSM formula, BS formula for European call

Risk neutral valution.

Volatility Smile

Feymann-Kac, Fokker Planck, Langevin Dynamics

Corresponding captures in [Book] Options, Futures and Other Derivatives   

SDE and PDE: [Book]Arbitrage Theory in Continuous Time, Chapter 5

Optional reading: Fokker Planck Equation

10.Apr 26 Maxima of Gaussian Process.

Subgaussian process

Dudley theorem (chaining)

Symmetrization, Rademacher complexity

[Book] Probability in High Dimension, Chapter 5

[Book] Foundation of Machine Learning, Chapter 3

Advanced Reading:Talagrand's Generic Chaining, [Book] Probability in High Dimension, Chapter 6

11.May 6 VC-dimension, Pseudo-dimension, Fat-Shattering dimension

Algorithm Stability, generalization of GD (convex, strongly convex)

[Book] Probability in High Dimension, Chapter 5

Train faster, generalize better: Stability of stochastic gradient descent

12.May 10 PAC-Bayesian framework

generalization of SGLD

Generalization based on Pac-bayesian


  1. Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
  2. Generalization Bounds of SGLD for Non-convex Learning
  3. On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning
13.May 17 Learnability of convex functions.

a brief survey of convex optimization (in particular finite sum form)


Convex parts can be found in [Book] Understanding Machine Learning: From Theory to Algorithms 

A Primer on Reproducing Kernel Hilbert Spaces

14.May 24 RKHS, Universal Kernel

Max Mean Discrepancy (MMD)

Equivalence between KRR and Gaussian process regression

Integral operator

local rademacher complexity


  1. A Primer on Reproducing Kernel Hilbert Spaces

  2. Universality, characteristic kernels and RKHS embedding of measures

  3. A Kernel Two-Sample Test

  4. Rademacher and Gaussian complexities: Risk bounds and structural results

  5. On Learning with Integral Operators

  6. Learning Kernels Using Local Rademacher Complexity

  7. To understand deep learning we need to understand kernel learning


15.May 31 Margin related

semisupervised learning





[Book] Introduction to online convex optimization

[Book] Learning, Prediction and Games

[Book] Options, Futures and Other Derivatives  

[Book] Advances in Financial Machine Learning

[Book] Convex Optimization

[Book] Foundation of Machine Learning

[Book] Probability in High Dimension

[Book] Understanding Machine Learning: From Theory to Algorithms

Lecture notes for STAT928: Statistical Learning and Sequential Prediction 

Python is the default programming language we will use in the course.

If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially

for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.

A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)

Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).