ATCS 每 Selected Topics in Learning, Prediction, and Optimization (with application in Finance)

2020 Spring

Lecturer: Jian Li ( lapordge at gmail dot com)

TA: Tianping Zhang ( ztp18 at )

time: every Monday 9:50am-12:15am

Room: TBD  (we use ZOOM for now)

We intend to cover a subset of the following topics (tentative):

(1) I assume you already know all basics (convex optimization and machine learning, stochastic gradient descent, gradient boosting, deep learning basics, CNN, RNN, please see my undergrad course). If you don't know any machine learning, I do not suggest you to take this course. I will recall some concepts briefly when necessary.

(2) online learning, multi-armed bandit, statistical learning theory, theory of deep learning

I won't stickly follow the above order....I may skip something mentioned above and cover something not mentioned above...It is a graduate course.

I will be talking about several applications of ML and optimization in Finance (trading, pricing derivatives etc), and of course in typical CS areas like vision, nlp, social networks as well...

I will teach about 2/3 of the classes. For the rest, I will choose some topics and students need to do class presentation.

Tentative topics for class presentation: GAN, adverserial training, federated learning, AutoML, various financial applications.

Some knowledge about convex optimization may be useful. See this course (by S. Boyd) and a previous course by myself. But it will be fine if you didn't take those courses. Basic machine learning knowledge is a must.
Andrew Ng's undergrad lecture notes

The course is a blending of theory and practice. We will cover both the underlying mathematics as well as interesting heuristics. 

A similar previous course.


  1. Homeworks (40pts, 1 homework every two or three weeks. We will have 2 coding assignments for this semester)
  2. 5 pts for taking notes: Each student should take notes for at least one lecture (maybe two), using LaTex (use this template sample.tex algorithm2e.sty).
  3. Course projects (55 pts. 10pts for mid-report, 10pts for final presentation, 35pts for final report)
  4. No exam. 




Feb 17 Basics of Online learning

The expert problem.

Multiplicative weights method

Online gradient descent

  1. Multiplicative weights method: a meta-algorithm and its applications. (A survey) Sanjeev Arora, Elad Hazan, and Satyen Kale. [pdf] The method can be used to solve approximately the zero-sum game and linear program, and is also closely related to Adaboost.
  2. Our exposition of expert problem follows this note
  3. Online convex programming and generalized infinitesimal gradient ascent. M. Zinkevich
  4. Selected Reading: Tracking the Best Expert (this can deal with dynamic regret)
Feb 24 Basics of Stock and Future Markets.

Cover's universal portfolio.

2nd order bound

Online Mirror Descent: Fenchel conjugate function

(closely related to other notion of duality, such as polar dual, dual norm)

  1. Universal Portfolios

  2. Universal Portfolios With and Without Transaction Costs

  3. The Squint paper by Koolen and Van Erven

  4. Selected reading: online newton method (see this book Chapter 4) (this can solve the universal portfolio problem)
Mar 2 Online Mirror Descent

Interval regret, Sleeping Expert, Adaptive Regret

  1. Online mirror descent (see this note or this(chapter 5))

  2. selected reading: OMD is nearly optimal in a certain sense

Mar 9 Bandit:
stochastic: UCB
adversarial: EXP3

Pure exploration: Median Elimination

  1. Finite-time Analysis of the Multiarmed Bandit Problem

  2. The non-stochastic multi-armed bandit problem

  3. Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems

Selected Reading: OMD (this survey chapter 5)
Mar 16 Thompson Sampling

Contextual bandit: EXP4


  1. A Tutorial on Thompson Sampling

  2. The Epoch-Greedy Algorithm for Contextual Multi-armed Bandits

  3. Taming the monster: A fast and simple algorithm for contextual bandits

Selected Reading:

  1. Olivier Chapelle and Lihong Li. An empirical evaluation of Thompson sampling. In NIPS, 2011.
  2. Russo, D. and B. Van Roy. 2014. ※Learning to optimize via posterior sampling§. Mathematics of Operations Research. 39(4): 1221每1243. (Regret analysis of Thompson sampling, via a connection to UCB)
Mar 23 Hyperparameter optimization


Bayesian Optimization


Introduction to Stock and Future Market

Efficient Market Hypothesis

  1. Hyperband: A novel bandit-based approach to hyperparameter optimization

  2. Practical Bayesian Optimization of Machine Learning Algorithms
  3. Online to Batch

Selected Reading:

Gaussian process optimization in the bandit setting: No regret and experimental design

Mar 30 Digress to Martingale convergence theorem

Brief Intro to Future and Option

No arbitrage argument

Basics of Brownian Motion

Ito Calculus

  See corresponding chapters in Options, Futures and Other Derivatives 
Aril 6 Break. No class.
Apr 13. Pricing Derivatives

Binomial Tree model

Black-Scholes Merton (BSM) model

Pricing derivative via no regret online learning

  1. See corresponding chapters in Options, Futures and Other Derivatives 
Apr 20 Relation between SDE and PDE

Feynman-Kac formula

Risk neutral valuation representation of BS formula

Fokker-Planck formula

Robust pricing options using online learning

Volatility smile

  1. Online Trading Algorithms and Robust Option Pricing
  2. Learning, Regret Minimization and Option Pricing
  3. Robust option pricing: Hannan and Blackwell meet Black and Scholes

Selected Reading:

  1. How to Hedge an Option Against an Adversary: Black-Scholes Pricing is Minimax Optimal
  2. Minimax Option Pricing Meets Black-Scholes in the Limit

Apr 27 Quick review of Classical Statistical Learning Theory

Rademacher complexity

Maxima of (sub)Gaussian process


Packing/Covering number


There are many good refereces for these classical results

I mainly followed the very well written (by van Handel) [lecture note] Probability in High Dimension (chapter 5 and 7)

[Book] Neural Network Learning: Theoretical Foundations is also a very good source.

May 4 Labor day break, class moved to May 9
May 9

Pseudo-dimension, Fat-Shattering dimension

Rademacher complexity

Relation between these dimensions and covering/packing numbers.

  1. [Book] Neural Network Learning: Theoretical Foundations (Chapter 11)
  2. Rademacher and Gaussian complexities: Risk bounds and structural results

  3. Scale-sensitive dimensions, uniform convergence, and learnability

Selected reading:

  1. Local Rademacher Complexities

  2. Fast rates in statistical and online learning › cs
May 11 Classical VC Bounds of Neural Networks

(overview) generalization bounds in deep learning

Rethinking generalization

Spectral norm generalization bound

Generalization via Compression

  1. [Book] Neural Network Learning: Theoretical Foundations (Chapter 7,8)
  2. Understanding deep learning requires rethinking generalization
  3. Spectrally-normalized margin bounds for neural networks
  4. Stronger generalization bounds for deep nets via a compression approach

Selected Reading:

  1. A pac-bayesian approach to spectrally-normalized margin bounds for neural networks

  2. Non-vacuous generalization bounds at the imagenet scale: a PAC-bayesian compression approach

  3. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data

  4. Uniform convergence may be unable to explain generalization in deep learning

May 18 RKHS

MMD, Universality, integral operator

generalization bound for kernel method

(overview) why existing (kernel) generalization bounds are not enough

  1. A Primer on Reproducing Kernel Hilbert Spaces

  2. [Book] Learning with Kernels. Support Vector Machines, Regularization, Optimization, and Beyond. By Bernhard Schölkopf and Alexander J. Smola, Chapter 1 and 2
  3. Universality, characteristic kernels and RKHS embedding of measures

  4. A Kernel Two-Sample Test

Selected reading:

  1. On Learning with Integral Operators


May 25 student presentation group 1 About Double Descent
  1. Reconciling modern machine-learning practice and the classical bias每variance trade-off

  2. Deep double descent: Where bigger models and more data hurt

  3. The generalization error of random features regression: Precise asymptotics and double descent curve

  4. Surprises in high-dimensional ridgeless least squares interpolation

  5. Optimal regularization can mitigate double descent

Jun 1 student presentation group 2 About Kernels
  1. To understand deep learning we need to understand kernel learning

  2. Rademacher and Gaussian complexities: Risk bounds and structural results

  3. Learning Kernels Using Local Rademacher Complexity

  4. Neural Tangent kernel
  5. Training Neural Networks as Learning Data-Adaptive Kernels
Jun 8 Project presentation


[Book] Introduction to online convex optimization

[Book] Learning, Prediction and Games

[Book] Options, Futures and Other Derivatives  

[Book] Advances in Financial Machine Learning

[Book] Foundation of Machine Learning

[Book] Understanding Machine Learning: From Theory to Algorithms

Lecture notes for STAT928: Statistical Learning and Sequential Prediction 

Python is the default programming language we will use in the course.

If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially

for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.

A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)

Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).