Advance Theoretical Computer Science  Selected topics in convex, combinatorial and stochastic optimization
2014 Spring
Lecturer: Jian Li
The course covers some selected topics in the extremely broad area of optimization. In particular, we cover a subset of the following topics (tentative):
(1) Linear programming, and more generally convex programming and their
applications in combinatorial algorithms and machine learning problems. We will
spend a few lectures in various randomized/deterministic rounding methods and
the primaldual method. (34 weeks) Linear programming in sparse recovery and
LASSO (12 weeks)
(2) Descent methods to solve convex optimization problems. Traditional and
stochastic gradient descent,subgradient, mirror descent, proximal descent, etc. (35 weeks)
(3) Online and stochastic learning and Optimization. expert problem, online
learning and prediction, online convex optimization, multiarm bandit (3 weeks)
2stage stochastic optimization, sample average method, stochastic knapsack,
secretary, prophet inequality (35 weeks)
In fact, each of the above topic is a rich and deep area. So we will not have
time to go much deeper in each of them. But instead, I will select some
representative problems in each area and cover them in some details and focus a
bit more on their mutual connections and relations to other theory and
application areas. The idea is to give you a rough picture of these optimization
areas, and when you need any of them in your own research, you know what tools
to use and which paper/book to look.
If you have been to John's convex optimization class http://itcs.tsinghua.edu.cn/~john/convex2013.html , you will be seeing a lot of closely related stuffs. Some parts of this course (esp the online&stochastic optimization part) is also related to Longbo's stochastic network optimization course. But it will be fine if you didn't take those courses. Some of the algorithms we will cover, esp those for solving convex optimization problems, should also be very useful in machine learning research.
Grading:
Homeworks (30pts): 25pts for problems and 5pts for taking notes
Each student should take notes for at least one lecture, using LaTex (use this template sample.tex algorithm2e.sty).
1 takehome final, till the end of the semester (30pts)
1 course project (40pts) Project topics
Schedule:
Every Wed 1:30pm4:45pm (1:302:15, 2:203:05, 3:103:55, 4:004:45)
Feb 26  Overview. Linear Programming, Duality,
Totally Unimodular Matrix, Introduction to
approximation algorithms, vertex cover, set
cover (randomized rounding, dual fitting) (read this note for some basic probabilistic tools we will use later) 

Mar 5  Chernoff Bounds, More independent randomized rounding (coin flipping, discrepancyO(sqrt(n\logn)), congestion minimization, throughput maximization 11/e), dependent rounding (congestion minimizationmultipath, throughput maximization 3/4approx)  
Mar 12  #nonzeros in a vertex LP solution. Scheduling related machine, BeckFiala's discrepancy theorem. Primaldual for the hitting set problem (set cover) with applications to vertex cover  
Mar 19  Primaldual for the hitting set problem (set cover) with applications to vertex cover and shortest path and feedback vertex set, Primal dual for the steiner forest problem, primaldual for facility location.  
Mar 26  Guest Lecture (by Prof. Pingzhong Tang). Linear programming in game theory.  
Apr 2  Linear programming in compressive sensing: Sparse models, spark, null space property, restricted isometry property, L1minimization (exact recovery and noisy recovery).  
Apr 9  (By Lingxiao Huang) Matrix completion (no proof). Nonnegative matrix factorization using linear programming  
Apr 16  Brief overview of convex optimization. Gradient descent. Exact line search and backtrack line search. Convergence analysis for strongly convex functions. Condition number. Steepst descent.  
Apr 23  Newton's method. Affine invariance, convergence of Newton (without proof). Convergence analysis of gradient descent for Lipschitz gradient functions. sqrt(condition number) lower bound for first order methods, Conjugate gradient  
May 7  Conjugate gradient (cont.), Convergence analysis of CG using Chebyshev polynomials. A brief intro to some machine learning problems: regression, logistic regression, SVM, regularization, Stochastic gradient descent  
May 14  Chebyshev iterative method (heavy ball method, an ODE intepretation, no proof), Stochastic gradient descent (cont.), Analysis of SGD (fixed steps, decreasing steps, JohnsonZhang's variance reduction for decomposable objective)  
May 21  Attend Sanjeev Arora's talk  
May 28  A quick overview (no analysis) to other descent method (projected GD, mirror descent, FrankWolfe, Nesterov's optimal method. (Some methods we didn't mention at all: Proximity map, smoothing, decomposition method, ADMM, interior point). Start online optimization, coin guessing problem, zerosum game, Blackwell approachability theorem  
Jun 4  Blackwell approachability theorem (cont) and an application to online decision problem. Multiplicative Weighting Algorithm (and application to zero sum game and solving LP).  
Jun 11  Follow the perturbed leader, Secretary problem, Stochastic knapsack, stochastic matching 
The scribe notes are not polished yet and we do not guarantee correctness. Use them at your own risk.
References:
1. Combinatorial Optimization and Approximation Algorithms
from the CMU course:
Intro to Linear Programming (RR). [unedited pdf]
A survey by Bob Carr and Goran Konjevod on Polyhedral Combinatorics.
LP rounding algorithms and gaps for SetCover and MaxCoverage
Facility Location; constant factor LP rounding
Facility Location; primaldual algorithms
Generalized Assignment Problems. (AG) [unedited pdf]
The original randomized rounding paper by Raghavan and Thompson
2. L0/L1 regularization, Sparse recovery
M. A. Davenport, M. F. Duarte, Y. C. Eldar, and G. Kutyniok, "Introduction to Compressed Sensing," in Compressed Sensing: Theory and Applications, Cambridge University Press, 2012.
l1 minimization, Restricted Isometry Property. Notes: pdf. (from the Recht's course)
the original lasso paper: Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267288).
Cand`es E. and Tao T. The Dantzig selector: statistical
estimation when p is much larger than n. Available at
http://www.acm.caltech.edu/~emmanuel/papers/DantzigSelector.pdf.
3.Gradient methods:
Vandenberghe's lecture notes
Rie Johnson and Tong Zhang. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NIPS 2013.
Shai ShalevShwartz and Tong Zhang. Proximal Stochastic Dual Coordinate Ascent, Tech Report arXiv:1211.2717, NIPS 2013
On Decomposing the Proximal Map Y Yu, NIPS 2013
4. Online Learning and optimization
Multiplicative weights method: a metaalgorithm and its applications. (A survey) Sanjeev Arora, Elad Hazan, and Satyen Kale. [pdf] Accepted to Theory of Computing journal, 2010. The method can be used to solve approximately the zerosum game and linear program, and is also closely related to Adaboost.
Four proofs of Gittins' multiarmed bandit theorem. E. Frostig and G. Weiss.
UCB Finitetime Analysis of the Multiarmed Bandit Problem. P. Auer, N. CesaBianchi, and P. Fischer
Efficient algorithms for online decision problems. A. Kalai and S. Vempala
Online convex programming and generalized infinitesimal gradient ascent. M. Zinkevich.
J.Y. Audibert, S. Bubeck and G. Lugosi, Regret in Online Combinatorial Optimization. To appear in Mathematics of Operations Research, 2013. [draft]
5. Stochastic Optimization
Related Course:
Advanced Approximation Algorithms: Anupam Gupta and Ryan O'Donnell(CMU): http://www.cs.cmu.edu/~anupamg/advapprox/
Approximation Algorithms Anupam Gupta and R. Ravi http://www.cs.cmu.edu/afs/cs/academic/class/15854f05/www/
Online and Stochastic Optimization: Ashish Goel (stanford) http://www.stanford.edu/~ashishg/msande325_09/
Convex Geometry in HighDimensional Data Analysis: Ben Recht(Wisconsin) http://pages.cs.wisc.edu/~brecht/cs838.html
Optimization Methods for LargeScale Systems Prof. L. Vandenberghe, UCLA http://www.seas.ucla.edu/~vandenbe/ee236c.html