ATCS ¨C Learning and Prediction ¨C Theory and Practice
Lecturer: Jian Li
TA: Wei Cao email@example.com
Room: FIT bldg 1-222
time: every monday 1:30pm-3:55pm
1. cocktail party problem: We are given a sample speech data consisting of people talking simultaneously in a room. How can we separate them? (check out independent component analysis)
2. Netflix challenge:
We are given users' rates of a set of movies (as in the following matrix). But many entries of the matrix
is empty (the user didn't see/rate the movie). How can we predict the empty entries?
Check out Matrix completion, Collaborative filtering.
3. How to invest in a portfolio of stocks, such that you are no worse than investing the best stock in the long run?
(check out Universal Portfolio)
4. Given a collection of documents (newspaper article, twits, weibo, customer reviews...), how can we extract the topics (without knowning which topics they are about)?
2. What is the answer of this? Man - Woman + King = ? Answer: Queue. (check out word2vec)
3. What about this? (Your algorithm can generate the last fig! Check out CNN+visual analogy)
How can you use machine to generate hand-writing like this?
Try more examples in
Check out RNN (LSTM)
Given an image, say something about it.
How to make your algorithm "paint" like a great artist!
We intend to cover a subset of the following topics (tentative):
(1) quickly recall basics about convex optimization, linear/logistic regression, regularization, newton method, stochastic gradient descent (asychronized, variance reduction method), generative vs discrimitive, variance vs bias
(2) Off-the-shelf machine learning and prediction algorithms: k-nn, SVM, kernel trick, clustering, Adaboost, gradient boosting, random forest, basic graphical models (including naive bayes, CRF, LDA)
(3) Online learning. Multi-armed bandit, Universal portfolio, Multiplicative weighting method
(4) linear algebra-based learning algorithms: SVD, principle
component analysis (PCA), independent component analysis (ICA), Nonnegative
matrix factorization (NMF), topic modeling, matrix completion, dictionary
learning, tensor method, spectral clustering
(5) Deep learning: generative models like DBN, Back propagation, auto-encoder, most of time CNN and RNN (LSTM), neural turing machine, (attention-based?), MANY AMUZING APPLICATIONS, possibly some newest developments.
(Maybe): word2vec, hashing/sketching, multi-task learning, community detection,....
Some knowledge about convex optimization may be useful. See John¡¯s class http://itcs.tsinghua.edu.cn/~john/convex2013.html and a previous course by myself. But it will be fine if you didn't take those courses.
The course is a blending of theory and practice. We will cover both the underlying mathematics and existing tools for quick and proper implementations. Writing code is a necessary part of this course.
|Feb.22||Brief review of probability (expectation, variance, covariance, concentration inequality, central limit theorem), Linear regression, SVD, PCA||a PCA tutorial|
|Mar.1||Finishing PCA. Eigenface. Frequentist and Bayesian. Generative & discrimitive models. A probabilistic interpetation of linear regression, A bayesion approach to linear regress, regularization, Convex programming (basics, projected gradient descent, stochastic gradient descent).||
(A small monongraph) Convex Optimization: Algorithms and Complexity, by Bubeck
(a classic textbook) S. Boyd and L. Vandenberghe, Convex Optimization
|Mar.8||Newton method, Logistic regression, perceptron, how to handle huge feature vectors (count-min sketch).||The Count-Min Sketch|
|Mar.14||KKT condition, SVM, kernel trick, Geometry of SVM, Reproducing Hilbert Space, How to scale up kernel method (Random Fourier Features, Hashing)||Random Features for Large-Scale Kernel Machines
Feature Hashing for Large Scale Multitask Learning
Duality and Geometry in SVM classifier
Core Vector Machines
Coreset for Polytope Distance
|Mar.21||Topic modeling. Nonnegative matrix factorization. Anchor
word assumption and an algorithm.
Conjugate prior (binomial/beta, multinomial/dirichlet), Dirichlet distributions. LDA (latent dirichlet distribution) and variants.
Chapter 2 in lecture
notes (by Moitra)
Learning Topic Models - Going beyond SVD
|Mar. 28||Independent Component Analysis
(Kurtosis, Negentropy, Mutual information)
Start Sparse Coding and Dictionary learning (LASSO, L0 and L1, compressive sensing, matching pursuit, othogonal matching pursuit)
Independent Component Analysis: Algorithms and Applications
The element of statistical learning, Ch 14.7
For compressive sensing: the following chapter is a good starting point:
Notes from my previous course
|Apr. 4||Holiday, no class||¡¡|
(Mention)Some compressive sensing basics
Mention (matrix completion, robust PCA)
Sparse Coding and Dictionary learning
K-SVD algorithm, online dictionary learning. how to use DL to denoise, image restoration
CART-Classification or Regression Trees
K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse
Online dictionary learning for sparse coding
CART: The element of statistical learning, Ch 9.2.
LASSO, Least angle regression: The element of statistical learning, Ch 3.4.4.
Matrix completion: last Chapter in lecture notes (by Moitra)
mention( bootstrap, bagging)
Gradient boosting, random forests.
|Additive models, Adaboost, gradient boosting:
The element of statistical learning, Ch 9 and 10
Random forest: The element of statistical learning, Ch 9 and 10
Some learning theory about boosting: Foundation of Machine learning, Ch. 6.
XGBoost: A Scalable Tree Boosting System
A lot of materials in Awesome Random Forest http://jiwonkim.org/awesome-random-forest/
Deep learning, BP, CNN
(a brief intro to Theano, by Wei Cao)
Quick overiew of CNN http://cs231n.github.io/convolutional-networks/
Lots of small training tricks in
|May. 2||No class (labor day)||¡¡|
Visualizing higher layers of CNN (Deconvnet), some CNN architechtures
(Alexnet, GoogleNet, ResNet).
Visualizing and understanding CNN
Deep Residual Learning for Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun
RNN chapter in the deep learning book
Awesome RNN https://github.com/kjw0612/awesome-rnn
An online RNN tutorial
|May. 16||Word2Vec, Glove, deepwalk||Distributed Representations of Words
and Phrases and their Compositionality
Glove: Global vectors for word representation.
INFINITE DIMENSIONAL WORD EMBEDDINGS, Nalisnick, Ravi
Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings.
Neural Turing Machine
A Neural Algorithm of Artistic Style
Neural Turing Machines, arXiv preprint arXiv:1410.5401, A.Graves, G. Wayne, and I. Danihelka.
EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES
End-to-End memory network (for Q&A) (by Wei Cao)
Deep learning for video processing/understanding (by Chuang Gan)
Basics of RL, MDP, Value iteration, policy iteration, Q-learning, Combining DNN with policy gradient,
Q-learning : Neuro-Dynamic Programming, Ch 5.6
A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning
A good blog post on Policy Gradient by Andrej Karpathy
RL course by David Silver
Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (code and videos)
Slides on Deep Q-learning
Related papers (under construction..):
Universal Portfolios, Thomas Cover
Universal Portfolios With and Without Transaction Costs, Avrim Blum, Adam Kalai.
The Multiplicative Weights Update Method: a Meta-Algorithm and Applications, Sanjeev Arora, Elad Hazan, Satyen Kale
An Empirical Exploration of Recurrent Network Architectures
Deep Visual Analogy-Making, by ScottReed Yi Zhang Yuting Zhang Honglak Lee
Deep Visual-Semantic Alignments for Generating Image Description, Andrej Karpathy and Li Fei-Fei
Striving for Simplicity: The All Convolutional Net, Jost Tobias Springenberg, Alexey Dosovitskiy, Thomas Brox, Martin Riedmiller
Spatial Transformer Networks: Max Jaderberg Karen Simonyan Andrew Zisserman Koray Kavukcuoglu
FractalNet: Ultra-Deep Neural Networks without Residuals
[Book] The elements of statistical learning
[Book] Pattern recognition and machine learning
[Book] Probabilistic Graphical Models
Alex Smola¡¯s class at CMU http://alex.smola.org/teaching/cmu2013-10-701x/
Andrew Ng¡¯s class at Stanford https://www.coursera.org/learn/machine-learning
Nando de Freitas¡¯s class at Oxford https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/
Carlos Guestrin / Emily Fox¡¯s class at UW http://courses.cs.washington.edu/courses/cse599c1/13wi/lectures.html
Richard Socher's course at Stanford: http://cs224d.stanford.edu/index.html
Feifei Li, Andrej Karpathy and Justin Johnson's class at Standford: http://cs231n.stanford.edu/
Ankur Moitra's (MIT) lecture notes (Algorithmic machine learning) lecture notes
Python is the default programming language we will use in the course.
If you haven't use it before, don't worry. It is very easy to learn (if you know any other programming language), and is a very efficient language, especially
for prototyping things related scientific computing and numerical optimization. Python codes are usually much shorter than C/C++ code (the lauguage has done a lot for you). It is also more flexible and generally faster than matlab.
A standard combination for this class is Python+numpy (a numeric lib for python)+scipy (a scientific computing lib for python)+matplotlab (for generating nice plots)
Another somewhat easier way is to install Anaconda (it is a free Python distribution with most popular packages).