无标题 1

Advance Theoretical Computer Science - Selected topics in convex, combinatorial and stochastic optimization

2014 Spring

The course covers some selected topics in the extremely broad area of optimization. In particular, we cover a subset of the following topics (tentative):

(1) Linear programming, and more generally convex programming and their applications in combinatorial algorithms and machine learning problems. We will spend a few lectures in various randomized/deterministic rounding methods and the primal-dual method. (3-4 weeks) Linear programming in sparse recovery and LASSO (1-2 weeks)
(2) Descent methods to solve convex optimization problems. Traditional and stochastic gradient descent,subgradient, mirror descent, proximal descent, etc. (3-5 weeks)
(3) Online and stochastic learning and Optimization. expert problem, online learning and prediction, online convex optimization, multi-arm bandit (3 weeks) 2-stage stochastic optimization, sample average method, stochastic knapsack, secretary, prophet inequality (3-5 weeks)

In fact, each of the above topic is a rich and deep area. So we will not have time to go much deeper in each of them. But instead, I will select some representative problems in each area and cover them in some details and focus a bit more on their mutual connections and relations to other theory and application areas. The idea is to give you a rough picture of these optimization areas, and when you need any of them in your own research, you know what tools to use and which paper/book to look.

If you have been to John's convex optimization class http://itcs.tsinghua.edu.cn/~john/convex2013.html , you will be seeing a lot of closely related stuffs. Some parts of this course (esp the online&stochastic optimization part) is also related to Longbo's stochastic network optimization course. But it will be fine if you didn't take those courses. Some of the algorithms we will cover, esp those for solving convex optimization problems, should also be very useful in machine learning research.

Grading:

Homeworks (30pts): 25pts for problems and 5pts for taking notes

Each student should take notes for at least one lecture, using LaTex (use this template sample.tex algorithm2e.sty).

1 take-home final, till the end of the semester (30pts)
1 course project (40pts) Project topics

Schedule:

Every Wed 1:30pm-4:45pm (1:30-2:15, 2:20-3:05, 3:10-3:55, 4:00-4:45)

Feb 26	Overview. Linear Programming, Duality, Totally Unimodular Matrix, Introduction to approximation algorithms, vertex cover, set cover (randomized rounding, dual fitting) (read this note for some basic probabilistic tools we will use later)	pdf
Mar 5	Chernoff Bounds, More independent randomized rounding (coin flipping, discrepancy-O(sqrt(n\logn)), congestion minimization, throughput maximization 1-1/e), dependent rounding (congestion minimization-multipath, throughput maximization 3/4-approx)	pdf
Mar 12	#nonzeros in a vertex LP solution. Scheduling related machine, Beck-Fiala's discrepancy theorem. Primal-dual for the hitting set problem (set cover) with applications to vertex cover	pdf
Mar 19	Primal-dual for the hitting set problem (set cover) with applications to vertex cover and shortest path and feedback vertex set, Primal dual for the steiner forest problem, primal-dual for facility location.	pdf
Mar 26	Guest Lecture (by Prof. Pingzhong Tang). Linear programming in game theory.
Apr 2	Linear programming in compressive sensing: Sparse models, spark, null space property, restricted isometry property, L1-minimization (exact recovery and noisy recovery).	pdf
Apr 9	(By Lingxiao Huang) Matrix completion (no proof). Nonnegative matrix factorization using linear programming
Apr 16	Brief overview of convex optimization. Gradient descent. Exact line search and backtrack line search. Convergence analysis for strongly convex functions. Condition number. Steepst descent.	pdf
Apr 23	Newton's method. Affine invariance, convergence of Newton (without proof). Convergence analysis of gradient descent for Lipschitz gradient functions. sqrt(condition number) lower bound for first order methods, Conjugate gradient	pdf
May 7	Conjugate gradient (cont.), Convergence analysis of CG using Chebyshev polynomials. A brief intro to some machine learning problems: regression, logistic regression, SVM, regularization, Stochastic gradient descent
May 14	Chebyshev iterative method (heavy ball method, an ODE intepretation, no proof), Stochastic gradient descent (cont.), Analysis of SGD (fixed steps, decreasing steps, Johnson-Zhang's variance reduction for decomposable objective)
May 21	Attend Sanjeev Arora's talk
May 28	A quick overview (no analysis) to other descent method (projected GD, mirror descent, Frank-Wolfe, Nesterov's optimal method. (Some methods we didn't mention at all: Proximity map, smoothing, decomposition method, ADMM, interior point). Start online optimization, coin guessing problem, zero-sum game, Blackwell approachability theorem
Jun 4	Blackwell approachability theorem (cont) and an application to online decision problem. Multiplicative Weighting Algorithm (and application to zero sum game and solving LP).
Jun 11	Follow the perturbed leader, Secretary problem, Stochastic knapsack, stochastic matching

The scribe notes are not polished yet and we do not guarantee correctness. Use them at your own risk.

References:

1. Combinatorial Optimization and Approximation Algorithms

from the CMU course:

Intro to Linear Programming (RR). [unedited pdf]

A survey by Bob Carr and Goran Konjevod on Polyhedral Combinatorics.

LP rounding algorithms and gaps for Set-Cover and Max-Coverage

Facility Location; constant factor LP rounding

Facility Location; primal-dual algorithms

Group Steiner Tree

Multicut

Generalized Assignment Problems. (AG) [unedited pdf]

The original randomized rounding paper by Raghavan and Thompson

Dependent rounding and its applications to approximation algorithms

2. L0/L1 regularization, Sparse recovery

M. A. Davenport, M. F. Duarte, Y. C. Eldar, and G. Kutyniok, "Introduction to Compressed Sensing," in Compressed Sensing: Theory and Applications, Cambridge University Press, 2012.

l1 minimization, Restricted Isometry Property. Notes: pdf. (from the Recht's course)

the original lasso paper: Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc B., Vol. 58, No. 1, pages 267-288).

Cand`es E. and Tao T. The Dantzig selector: statistical estimation when p is much larger than n. Available at
http://www.acm.caltech.edu/~emmanuel/papers/DantzigSelector.pdf.

3.Gradient methods:

Vandenberghe's lecture notes

Rie Johnson and Tong Zhang. Accelerating Stochastic Gradient Descent using Predictive Variance Reduction, NIPS 2013.

Shai Shalev-Shwartz and Tong Zhang. Proximal Stochastic Dual Coordinate Ascent, Tech Report arXiv:1211.2717, NIPS 2013

On Decomposing the Proximal Map Y Yu, NIPS 2013

4. Online Learning and optimization

Multiplicative weights method: a meta-algorithm and its applications. (A survey) Sanjeev Arora, Elad Hazan, and Satyen Kale. [pdf] Accepted to Theory of Computing journal, 2010. The method can be used to solve approximately the zero-sum game and linear program, and is also closely related to Adaboost.

Four proofs of Gittins' multiarmed bandit theorem. E. Frostig and G. Weiss.

UCB Finite-time Analysis of the Multiarmed Bandit Problem. P. Auer, N. Cesa-Bianchi, and P. Fischer

Efficient algorithms for online decision problems. A. Kalai and S. Vempala

Online convex programming and generalized infinitesimal gradient ascent. M. Zinkevich.

J.Y. Audibert, S. Bubeck and G. Lugosi, Regret in Online Combinatorial Optimization. To appear in Mathematics of Operations Research, 2013. [draft]

5. Stochastic Optimization

Related Course:

Advanced Approximation Algorithms: Anupam Gupta and Ryan O'Donnell(CMU): http://www.cs.cmu.edu/~anupamg/adv-approx/

Approximation Algorithms Anupam Gupta and R. Ravi http://www.cs.cmu.edu/afs/cs/academic/class/15854-f05/www/

Online and Stochastic Optimization: Ashish Goel (stanford) http://www.stanford.edu/~ashishg/msande325_09/

Convex Geometry in High-Dimensional Data Analysis: Ben Recht(Wisconsin) http://pages.cs.wisc.edu/~brecht/cs838.html

Optimization Methods for Large-Scale Systems Prof. L. Vandenberghe, UCLA http://www.seas.ucla.edu/~vandenbe/ee236c.html