ATCS – Learning and Prediction

ATCS – Learning and Prediction – Theory and Practice

2016 Spring

Room: FIT bldg 1-222

time: every monday 1:30pm-3:55pm

Some Amazing applications you will (probably, for now) see in the class:

1. cocktail party problem: We are given a sample speech data consisting of people talking simultaneously in a room. How can we separate them? (check out independent component analysis)

2. Netflix challenge:

We are given users' rates of a set of movies (as in the following matrix). But many entries of the matrix

is empty (the user didn't see/rate the movie). How can we predict the empty entries?

Check out Matrix completion, Collaborative filtering.

3. How to invest in a portfolio of stocks, such that you are no worse than investing the best stock in the long run?

(check out Universal Portfolio)

4. Given a collection of documents (newspaper article, twits, weibo, customer reviews...), how can we extract the topics (without knowning which topics they are about)?

2. What is the answer of this? Man - Woman + King = ? Answer: Queue. (check out word2vec)

3. What about this? (Your algorithm can generate the last fig! Check out CNN+visual analogy)

How can you use machine to generate hand-writing like this?

Recurrent Neural Network Handwriting Generation Demo

Try more examples in

http://www.cs.toronto.edu/~graves/handwriting.cgi?text=abc&style=&bias=0.15&samples=3

Check out RNN (LSTM)

Given an image, say something about it.

How to make your algorithm "paint" like a great artist!

Inceptionism!

We intend to cover a subset of the following topics (tentative):

(1) quickly recall basics about convex optimization, linear/logistic regression, regularization, newton method, stochastic gradient descent (asychronized, variance reduction method), generative vs discrimitive, variance vs bias

(2) Off-the-shelf machine learning and prediction algorithms: k-nn, SVM, kernel trick, clustering, Adaboost, gradient boosting, random forest, basic graphical models (including naive bayes, CRF, LDA)

(3) Online learning. Multi-armed bandit, Universal portfolio, Multiplicative weighting method

(4) linear algebra-based learning algorithms: SVD, principle component analysis (PCA), independent component analysis (ICA), Nonnegative matrix factorization (NMF), topic modeling, matrix completion, dictionary learning, tensor method, spectral clustering
　

(5) Deep learning: generative models like DBN, Back propagation, auto-encoder, most of time CNN and RNN (LSTM), neural turing machine, (attention-based?), MANY AMUZING APPLICATIONS, possibly some newest developments.

(Maybe): word2vec, hashing/sketching, multi-task learning, community detection,....

Some knowledge about convex optimization may be useful. See John’s class http://itcs.tsinghua.edu.cn/~john/convex2013.html and a previous course by myself. But it will be fine if you didn't take those courses.

The course is a blending of theory and practice. We will cover both the underlying mathematics and existing tools for quick and proper implementations. Writing code is a necessary part of this course.

Grading:

Homeworks (55pts, 1 homework every two weeks, the homeworks may have some small coding assignments)
5pts for taking notes: Each student should take notes for at least one lecture (maybe two), using LaTex (use this template sample.tex algorithm2e.sty).
Course projects (40 pts) (see Stanford 229 course projects, note that it is an undergrad class)
No exam.

Schedule:

Feb.22

Brief review of probability (expectation, variance, covariance, concentration inequality, central limit theorem), Linear regression, SVD, PCA

a PCA tutorial

Further reading: The Singular Value Decomposition, Applications,and Beyond (PDF) by Zhihua Zhang (SJTU)

Mar.1

Finishing PCA. Eigenface. Frequentist and Bayesian. Generative & discrimitive models. A probabilistic interpetation of linear regression, A bayesion approach to linear regress, regularization, Convex programming (basics, projected gradient descent, stochastic gradient descent).

Eigenface paper

Asynchronous Stochastic Optimization
A lot of convex optimization stuffs was covered in a previous class of mine

Further reading:

(A small monongraph) Convex Optimization: Algorithms and Complexity, by Bubeck

(a classic textbook) S. Boyd and L. Vandenberghe, Convex Optimization

Mar.8

Newton method, Logistic regression, perceptron, how to handle huge feature vectors (count-min sketch).

The Count-Min Sketch

Mar.14

KKT condition, SVM, kernel trick, Geometry of SVM, Reproducing Hilbert Space, How to scale up kernel method (Random Fourier Features, Hashing)

Random Features for Large-Scale Kernel Machines

Feature Hashing for Large Scale Multitask Learning

Further reading:

Duality and Geometry in SVM classifier

Core Vector Machines

Coreset for Polytope Distance

Mar.21

Topic modeling. Nonnegative matrix factorization. Anchor word assumption and an algorithm.

Conjugate prior (binomial/beta, multinomial/dirichlet), Dirichlet distributions. LDA (latent dirichlet distribution) and variants.

Chapter 2 in lecture notes (by Moitra)

Dirichlet Distribution

Latent Dirichlet Allocation

Further reading:

Learning Topic Models - Going beyond SVD

Finding scientific topics

Hierarchical dirichlet processes

Lightlda: Big topic models on modest computer clusters

Mar. 28

Independent Component Analysis (Kurtosis, Negentropy, Mutual information)

Start Sparse Coding and Dictionary learning (LASSO, L0 and L1, compressive sensing, matching pursuit, othogonal matching pursuit)

Independent Component Analysis: Algorithms and Applications

The element of statistical learning, Ch 14.7

Further reading:

Infomax-ICA

Learning linear transformations

Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders

For compressive sensing: the following chapter is a good starting point:

http://statweb.stanford.edu/~markad/publications/ddek-chapter1-2011.pdf

Notes from my previous course

Apr. 4	Holiday, no class
Apr. 11	(Mention)Some compressive sensing basics Mention (matrix completion, robust PCA) Sparse Coding and Dictionary learning K-SVD algorithm, online dictionary learning. how to use DL to denoise, image restoration CART-Classification or Regression Trees Start boosting.	K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation Online dictionary learning for sparse coding Sparse representation for color image restoration CART: The element of statistical learning, Ch 9.2. Further reading: LASSO, Least angle regression: The element of statistical learning, Ch 3.4.4. Matrix completion: last Chapter in lecture notes (by Moitra) Robust Principal Component Analysis
Apr. 18	Adaboost, mention( bootstrap, bagging) Gradient boosting, random forests.	Additive models, Adaboost, gradient boosting: The element of statistical learning, Ch 9 and 10 Random forest: The element of statistical learning, Ch 9 and 10 Further reading: Some learning theory about boosting: Foundation of Machine learning, Ch. 6. Deep Boosting XGBoost: A Scalable Tree Boosting System Explaining the Success of AdaBoost and Random Forests 　 A lot of materials in Awesome Random Forest http://jiwonkim.org/awesome-random-forest/
Apr. 26	Start Deep learning, BP, CNN (a brief intro to Theano, by Wei Cao)	BP notes http://cs231n.github.io/optimization-2/ Quick overiew of CNN http://cs231n.github.io/convolutional-networks/ Further reading: Visualizing and understanding CNN Lots of small training tricks in http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf Practical Recommendations for Gradient-Based Training of Deep Architectures Efficient Backprop Adagrad
May. 2	No class (labor day)
May. 9	Visualizing higher layers of CNN (Deconvnet), some CNN architechtures (Alexnet, GoogleNet, ResNet). RNN, LSTM	Visualizing and understanding CNN Deep Residual Learning for Image Recognition, Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun http://cs231n.stanford.edu/slides/winter1516_lecture10.pdf Further reading: RNN chapter in the deep learning book VISUALIZING AND UNDERSTANDING RECURRENT NETWORKS Awesome RNN https://github.com/kjw0612/awesome-rnn An online RNN tutorial DRAW:A Recurrent Neural Network For Image Generation
May. 16	Word2Vec, Glove, deepwalk	Distributed Representations of Words and Phrases and their Compositionality Glove: Global vectors for word representation. DeepWalk: Online Learning of Social Representations Further reading: INFINITE DIMENSIONAL WORD EMBEDDINGS, Nalisnick, Ravi Swivel: Improving Embeddings by Noticing What's Missing Random Walks on Context Spaces: Towards an Explanation of the Mysteries of Semantic Word Embeddings.
May. 23	Image reconstruction Deepdream Neuralstyle Neural Turing Machine	Understanding Deep Image Representations by Inverting Them A Neural Algorithm of Artistic Style GitHub - google/deepdream (bat-country) Neural Turing Machines, arXiv preprint arXiv:1410.5401, A.Graves, G. Wayne, and I. Danihelka. Further reading: EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES
May. 30	End-to-End memory network (for Q&A) (by Wei Cao) Deep learning for video processing/understanding (by Chuang Gan)	[1503.08895] End-To-End Memory Networks 　 [Chuang Gan's slides]
Jun. 4	Deep Reinforcement Learning: Basics of RL, MDP, Value iteration, policy iteration, Q-learning, Combining DNN with policy gradient, DAGGER	Some classical method for MDP (value iteration, policy iteration, LP) Q-learning : Neuro-Dynamic Programming, Ch 5.6 A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning A good blog post on Policy Gradient by Andrej Karpathy Further reading: RL course by David Silver Deep Learning for Real-Time Atari Game Play Using Offline Monte-Carlo Tree Search Planning (code and videos) Slides on Deep Q-learning Awesome Reinforcement Learning

DeepWalk: Online Learning of Social Representations

Swivel: Improving Embeddings by Noticing What's Missing

Understanding Deep Image Representations by Inverting Them

GitHub - google/deepdream (bat-country)

[1503.08895] End-To-End Memory Networks

Some classical method for MDP (value iteration, policy iteration, LP)

Q-learning : Neuro-Dynamic Programming, Ch 5.6

A Reduction of Imitation Learning and Structured Prediction to No-Regret Online Learning

Further reading:

Awesome Reinforcement Learning