Publications:  ( All by topics / by dates) hide abstracts

Learning Theory (Learning Theory, Deep Learning Theory, Online Learning, Optimization)
AI in Finance (Fin-Tech, Quantitative Trading)
ML Applications (NLP, RL, CV, Traffic, etc)
Approximation Algorithms (for NP-hard scheduling, graph, geometry problems)
Computational Geometry
PTIME Problems (data structures, computational geometry)
Stochastic Optimization (stochastic combinatorial/geometry/online optimization)
Game Theory and Computational Economics
Databases
Networks
Others

Machine Learning Theory (Online learning, Deep Learning Theory, Optimization, Unsupervised learning and clustering algorithms)

  1. Simple Combinatorial Algorithms for Combinatorial Bandits: Corruptions and Approximations. Haike Xu, Jian Li. Uncertainty in Artificial Intelligence (UAI 2021). [paper]

  2. Improved Algorithms for Convex-Concave Minimax Optimization, Yuanhao Wang, Jian Li. 2020 Conference on Neural Information Processing Systems (NeurIPS 2020). [ArXiv]

  3. A Fast Anderson-Chebyshev Acceleration for Nonlinear Optimization. Zhize Li, Jian Li. The 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020) [ArXiv] [ Show Abstract ]

  4. Gradient Descent Maximizes the Margin of Homogeneous Neural Networks. Kaifeng Lyu, Jian Li. 2020 International Conference on Learning Representations (ICLR2020, Oral) [ArXiv] [ Show Abstract ]

  5. On Generalization Error Bounds of Noisy Gradient Methods for Non-Convex Learning. Jian Li, Xuanyuan Luo, Mingda Qiao. 2020 International Conference on Learning Representations (ICLR2020) [ArXiv] [ Show Abstract ]

  6. Stochastic Gradient Hamiltonian Monte Carlo with Variance Reduction for Bayesian Inference. Zhize Li, Tianyi Zhang, Shuyu Cheng, Jun Zhu, Jian Li. Machine Learning, 2019. [ArXiv] [ Show Abstract ]

  7. A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization. Zhize Li, Jian Li. Thirty-second Conference on Neural Information Processing Systems (NeurIPS 2018 spotlight) [ArXiv] [ Show Abstract ]

  8. eps-Coresets for Clustering (with Outliers) in Doubling Metrics. Lingxiao Huang, Shaofeng H.-C. Jiang, Jian Li, Xuan Wu. The 59th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2018) [Full version in ArXiv] [ Show Abstract ]

  9. Nearly Instance Optimal Sample Complexity Bounds for Top-k Arm Selection. Lijie Chen, Jian Li, Mingda Qiao. The 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017). [full version in ArXiv]

  10. Nearly Optimal Sampling Algorithms for Combinatorial Pure Exploration. Lijie Chen, Anupam Gupta, Jian Li, Mingda Qiao, Ruosong Wang. In the 30th Annual Conference on Learning Theory (COLT 2017) [Paper] [ Show Abstract ]

  11. Towards Instance Optimal Bounds for Best Arm Identification. Lijie Chen, Jian Li, Mingda Qiao. In the 30th Annual Conference on Learning Theory (COLT 2017) [ArXiv] [ Show Abstract ]

  12. Combinatorial Multi-Armed Bandit with General Reward Functions, Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu. Neural Information Processing Systems (NIPS), 2016.

  13. K-Means Clustering with Distributed Dimension. Hu Ding, Yu Liu, Lingxiao Huang, Jian Li. The 33rd International Conference on Machine Learning (ICML 2016).

  14. Pure Exploration of Multi-armed Bandit Under Matroid Constraints. Lijie Chen, Anupum Gupta, Jian Li. Conference on Learning Theory (COLT 2016). [Full version]

  15. On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs. Wei Cao, Jian Li, Zhize Li, Yufei Tao. Neural Information Processing Systems (NIPS), 2015. 

  16. Stochastic Online Greedy Learning with Semi-bandit Feedbacks. Tian Lin, Jian Li, Wei Chen. Neural Information Processing Systems (NIPS), 2015.

  17. Learning Arbitrary Statistical Mixtures of Discrete Distributions.  Jian Li, Yuval Rabani, Leonard J. Schulman, Chaitanya Swamy,  In ACM Symposium on the Theory of Computing (STOC 2015). [ArXiv]

  18. Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing, Yuan Zhou, Xi Chen, Jian Li. International Conference on Machine Learning (ICML 2014).  [full version]

    • We study the problem of selecting K arms with the highest expected rewards in a stochastic N-armed bandit game.  We
      propose a new PAC algorithm, which, with probability at least 1-delta , identifies a set of K arms with average reward at most \epsilon away from the top-k arm. Naive uniform sampling requires O(nlnn) samples. We show it is possible to achieve linear sample complexity. We show this is also a lower bound (meaning our upper bound is worse-case optimal). 


AI+Finance

  1. FactorVAE: A Probabilistic Dynamic Factor Model Based on Variational Autoencoder for Predicting Cross-sectional Stock Returns. Yitong Duan, Lei Wang, Qizhong Zhang, Jian Li. AAAI Conference on Artificial Intelligence (AAAI 2022).

  2. DoubleEnsemble: A New Ensemble Method Based on Sample Reweighting and Feature Selection for Financial Data Analysis. Chuhang Zhang, Yuanqi Li, Xi Chen, Yifei Jin, Pingzhong Tang, Jian Li. The IEEE International Conference on Data Mining (ICDM 2020). [ArXiv]

  3. An Adaptive Master-Slave Regularized Model for Unexpected Revenue Prediction Enhanced with Alternative Data. Jin Xu, Jingbo Zhou, Jia Yongpo, Jian Li and Hui Xiong. 36th IEEE International Conference on Data Engineering (ICDE 2020).


Machine Learning (RL, NLP, GDBT, Network Embedding, meta-learning, Applications in traffics)

  1. FlipDA: Effective and Robust Data Augmentation for Few-Shot Learning. Jing Zhou, Yanan Zheng, Jie Tang, Jian Li, and Zhilin Yang. 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). [ArXiv]

  2. Zheng, Yanan, Jing Zhou, Yujie Qian, Ming Ding, Jian Li, Ruslan Salakhutdinov, Jie Tang, Sebastian Ruder, and Zhilin Yang. FewNLU: Benchmarking state-of-the-art methods for few-shot natural language understanding. 60th Annual Meeting of the Association for Computational Linguistics (ACL 2022). [ArXiv]

  3. LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition. Jin Xu, Xu Tan, Yi Ren, Tao Qin, Jian Li, Sheng Zhao, and Tie-Yan Liu. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining (KDD 2020). [Paper]

  4. Policy Search by Target Distribution Learning for Continuous Control. Chuheng Zhang, Yuanqi Li, Jian Li. The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI 2020, Oral). [ArXiv]

  5. Relation Extraction with Temporal Reasoning Based on Memory Augmented Distant Supervision. Jianhao Yan, Lin He, Ruqin Huang, Jian Li and Ying Liu. Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019). [paper] [ Show Abstract ]

  6. NetSMF: Large-Scale Network Embedding as Sparse Matrix Factorization. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Chi Wang, Kuansan Wang and Jie Tang. The 2019 Web Conference (WWW 2019). Full paper (Oral). [paper] [ Show Abstract ]

  7. Gradient Boosting With Piece-Wise Linear Regression Trees. Yu Shi, Jian Li, Zhize Li. The 28th International Joint Conference on Artificial Intelligence (IJCAI 2019). [ArXiv] [ Show Abstract ]

  8. BRITS: Bidirectional Recurrent Imputation for Time Series. Wei Cao, Dong Wang,Jian Li, Hao Zhou, Lei Li, Yitan Li. Thirty-second Conference on Neural Information Processing Systems (NeurIPS 2018) [paper]

  9. SVM via Saddle Point Optimization: New Bounds and Distributed Algorithms. Lingxiao Huang, Yifei Jin and Jian Li. The Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018). [ArXiv]

  10. When Will You Arrive? Estimating Travel Time Based on Deep Neural Networks. Dong Wang, Junbo Zhang, Wei Cao, Jian Li, Yu Zheng. The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI 2018) [Paper] [Code and Data] [ Show Abstract ]

  11. Network Embedding as Matrix Factorization: Unifying DeepWalk, LINE, PTE, and node2vec. Jiezhong Qiu, Yuxiao Dong, Hao Ma, Jian Li, Kuansan Wang, Jie Tang. The 11th ACM International Conference on Web Search and Data Mining (WSDM 2018). [ArXiv] [ Show Abstract ]

  12. Learning Gradient Descent: Better Generalization and Longer Horizons. Kaifeng Lv, Shunhua Jiang, Jian Li. The 34th International Conference on Machine Learning (ICML 2017). [ArXiv] [ Show Abstract ][Code]

  13. DeepSD: Supply-Demand Prediction for Online Car-hailing Services using Deep Neural Networks. Dong Wang, Wei Cao, Jian Li, Jieping Ye. The 33th IEEE International Conference on Data Engineering (ICDE 2017). [paper]

  14. DESTPRE : A Data-Driven Approach to Destination Prediction for Taxi Rides. Mengwen Xu, Dong Wang, Jian Li. UbiComp 2016. [ Paper ]

  15. Trinary-Projection Trees for Approximate Nearest Neighbor Search. Jingdong Wang, Naiyan Wang, You Jia; Jian Li, Gang Zeng, Hongbin Zha, Xian-Sheng Hua. The IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2013 [Paper]


Stochastic Optimization (stochastic combinatorial optimization problems, stochastic geometry problems, stochastic online optimization, Bandits)

  1. Maximizing Expected Utility for Stochastic Combinatorial Optimization Problems; Jian Li, and Amol Deshpande; Mathematics of Operations Research (MOR), Vol. 44, No. 1, 2018. Preliminary version in Proceedings of the 52nd Annual IEEE Symposium on Foundations of Computer Science (FOCS 2011), Palm Springs, California, 2011. [Paper] [ArXiv]. [ Show Abstract ]

  2. Stochastic k-Center and j-Flat-Center Problems. Lingxiao Huang, Jian Li. ACM-SIAM Symposium on Discrete Algorithms (SODA17). [ArXiv]

  3. Combinatorial Multi-Armed Bandit with General Reward Functions, Wei Chen, Wei Hu, Fu Li, Jian Li, Yu Liu, Pinyan Lu. Neural Information Processing Systems (NIPS), 2016.

  4. Pure Exploration of Multi-armed Bandit Under Matroid Constraints. Lijie Chen, Anupum Gupta, Jian Li. Conference on Learning Theory (COLT 2016). [Full version]

  5. [Survey] Approximation Algorithms for Stochastic Combinatorial Optimization Problems. Jian Li and Yu Liu. Journal of the Operations Research Society of China. (invited article) [paper]

  6. On the Optimal Sample Complexity for Best Arm Identification. Lijie Chen, Jian Li. [ArXiv]

  7. On Top-k Selection in Multi-Armed Bandits and Hidden Bipartite Graphs. Wei Cao, Jian Li, Yufei Tao, Zhize Li.  Neural Information Processing Systems (NIPS), 2015. [full paper]

  8. Stochastic Online Greedy Learning with Semi-bandit Feedbacks. Tian Lin, Jian Li, Wei Chen. Neural Information Processing Systems (NIPS), 2015. [full paper]

  9. Approximating the Expected Values for Combinatorial Optimization Problems over Stochastic Points. Lingxiao Huang, Jian Li. The 42nd International Colloquium on Automata, Languages, and Programming (ICALP 2015), to appear. [ArXiv]  subsumes a previous draft: Minimum Spanning Trees, Perfect Matchings and Cycle Covers Over Stochastic Points in Metric Spaces.

  10. epsilon-Kernel Coresets for Stochastic Points. Jian Li, Jeff Phillips and Haitao Wang The 24rd Annual European Symposium on Algorithms (ESA 2016). [AxXiv]

    • We consider the standard stochastic geometry model in which the existence or the location of each point may be stochastic. We show there exists constant-size kernel coreset (a kernel can approximate either the expected value or the distribution of the width of the point set in every direction) and we can construct such kernel coresets in nearly linear time. We show its applications to several function extent problems, tracking moving points, and shape fitting problems in the stochastic setting.

  11. Optimal PAC Multiple Arm Identification with Applications to Crowdsourcing, Yuan Zhou, Xi Chen, Jian Li. International Conference on Machine Learning (ICML 2014).  [full version]

    • We study the problem of selecting K arms with the highest expected rewards in a stochastic N-armed bandit game.  We propose a new PAC algorithm, which, with probability at least 1-delta , identifies a set of K arms with average reward at most \epsilon away from the top-k arm. Naive uniform sampling requires O(nlnn) samples. We show it is possible to achieve linear sample complexity. We also establish a matching lower bound (meaning our upper bound is worse-case optimal). 

  12. Range Queries on Uncertain Data, Jian Li, Haitao Wang International Symposium on Algorithms and Computation (ISAAC 2014). [paper]

    • We are given a set of stochastic points on the real line, we consider the problem of building data structures on P to answer range queries: find the top-k points that lie in the given interval with the highest probability.

  13. A Fully Polynomial Approximation Scheme for Approximating a Sum of Random Variables, Jian Li and Tianlin Shi.  In Operation Research Letters (ORL), 2014 [ArXiv]

    • We show there is an FPTAS for approximating Pr[¡ÆX_i¡Ü1] for a set of independent (not necessarily identically distributed) random variables X_i.

  14. Stochastic Combinatorial Optimization via Poisson Approximation. Jian Li and Wen Yuan. In the 45th ACM Symposium on the Theory of Computing (STOC 2013), USA,2 013 [Paper] [Full version in ArXiv]

    • We propose a new technique, called Poisson approximation, for approximating/optimizing the sum of a set of random variables. In many cases, it reduces the stochastic problem to a deterministic multidimensional packing/covering problem. We can apply it to several problems, such as

      • Expected utility maximization (we can reproduce and generalize one result in our FOCS11 paper.)

      • Stochastic knapsack (proposed in [Dean et al. MOR08]): we are given a set items. The size and/or the profit of an item may not be fixed values and only their probability distributions are known to us in advance. The actual size and profit of an item are revealed to us as soon as it is inserted into the knapsack. We want to find a policy to insert the items such that the total expected profit is maximized. We obtain a bicriterion PTAS (i.e., 1+eps approximation with 1+eps capacity), even we allow correlation and cancellation of jobs.

      • Bayesian Online Selection with knapsack constraint: it is a variant of stochastic knapsack where the size and the profit of an item are revealed before the decision whether to select the item is made.

  15. When LP is the Cure for Your Matching Woes: Improved Bounds for Stochastic Matchings. Nikhil Bansal, Anupam Gupta, Jian Li, Julian Mestre, Viswanath Nagarajan, Atri Rudra. In the 18th Annual European Symposium on Algorithms (ESA 2010). (Best Paper Award) [Paper][Slides]
    Journal version in Algorithimca, 2011.[Journal Version]

    • We study the stochastic matching problem, which finds applications in kidney exchange, online dating and online ads. Consider a random graph model where each possible edge e is present independently with some probability pe. We are given these numbers pe, and want to build a large/heavy matching in the randomly generated graph. However, the only way we can find out whether an edge is present or not is to query it, and if the edge is indeed present in the graph, we are forced to add it to our matching. Further, each vertex i is allowed to be queried at most ti times. How should we adaptively query the edges to maximize the expected weight of the matching? Our main result is the first constant approximation for the weighted stochastic matching.


Approximation Algorithms (for NP-hard scheduling, graph, geometry problems)

  1. A Constant Factor Approximation Algorithm for Fault-Tolerant k-Median. Mohammadtaghi Hajiaghayi, Wei Hu, Jian Li, Shi Li, Barna Saha. In the ACM-SIAM Symposium on Discrete Algorithms (SODA 2014), Portland, Oregon, USA  [ArXiv]

    • We present the first constant approximation for the fault-tolerant k-median problem even when the demands are nonuniform (each demand point should be assigned to a given number of facilities). This generalizes a previous constant approximation for the uniform case and improves on a logarithmic approximation for the general nonuniform case.

    • We present the first constant approximation for the fault-tolerant facility location problem even when the weight function are non-monotone.  This generalizes a previous constant approximation for the monotone case.

  2. Ranking with Diverse Intents and Correlated Contents. Jian Li and Zeyu Zhang, In the 33rd Foundations of Software Technology and Theoretical Computer Science conference (FSTTCS 2013), IIT Guwahati, India. [ArXiv]

  3. Matroid and Knapsack Center Problems. Danny Z. Chen, Jian Li, Hongyu Liang, and Haitao Wang. In The 16th Conference on Integer Programming and Combinatorial Optimization (IPCO 2013), Chile, 2013 [Paper] [Full version in ArXiv; Journal version accepted in Algorithmica, 2015.

    • We consider two generalizations to the classic k-center problem. We require that the opened centers satisfy a given matroid constraint or a knapsack constraint.

  4. Generalized Machine Activation Problems. Jian Li and Samir Khuller. In the ACM-SIAM Symposium on Discrete Algorithms (SODA 2011),  San Francisco, USA, 2011. [Paper][slides]

    • We obtain tight approximation results for the machine activation problem proposed in our SODA10 work. We also obtain the first lnn-approximation for the universal facility location problem in non-metric spaces, answering an open question raised in previous work.

  5. Densest k-Subgraph Approximation on Intersection Graphs. Danny Z. Chen, Rudolf Fleischer, Jian Li. In the 8th Workshop on Approximation and Online Algorithms (WAOA 2010). [Paper][slides]

    • We provide constant factor approximation algorithm for the densest k-subgraph problem on several graph classes, such as chordal graphs, disk graphs etc. 

  6. New Models and Algorithms for Throughput Maximization in Broadcast Scheduling. Chandra Chekuri, Avigdor Gal, Sungjin Im, Samir Khuller, Jian Li, Richard McCutchen, Benjamin Moseley, Louiqa Raschid. In the 8th Workshop on Approximation and Online Algorithms (WAOA 2010). [slides] [full version]

  7. Clustering with Diversity. Jian Li, Ke Yi, Qin Zhang. In the 37th International Colloquium on Automata, Languages and Programming (ICALP 2010),July 5-10, 2010. [full version in arXiv]

    • We consider the clustering with diversity problem: given a set of colored points in a metric space, partition them into clusters such that each cluster has at least ell points, all of which have distinct colors. We give a 2-approximation to this problem for any ell when the objective is to minimize the maximum radius of any cluster. We show that the approximation ratio is optimal unless P\ne NP, by providing a matching lower bound. Several extensions to our algorithm have also been developed for handling outliers. This problem can be considered as a metric variant of the l-diversity problem, a popular problem for privacy-preserving data publication

  8. Energy Efficient Scheduling via Partial Shutdown. Samir Khuller, Jian Li, Barna Saha. In the ACM-SIAM Symposium on Discrete Algorithms (SODA 2010),  Austin, USA , 2010. [Paper]

    • The central framework we introduce considers a collection of m machines (unrelated or related) with each machine i having an activation cost of ai. There is also a collection of n jobs that need to be performed, and pi,j is the processing time of job j on machine i. Standard scheduling models assume that the set of machines is fixed and all machines are available. However, in our setting, we assume that there is an activation cost budget of A ¨C we would like to select a subset S of the machines to activate with total cost a(S) A and find a schedule for the n jobs on the machines in S minimizing the makespan (or any other metric).

  9. The load-distance balancing problem. Edward Bortnikov, Samir Khuller, Jian Li, Yishay Mansour and Seffi Naor. Networks, 2012.[Paper] [doi]

  10. Approximating the Maximum Sharing Problem. Amitabh Chaudhary, Danny Z. Chen, Rudolf Fleischer, Jian Li, Xiaobo S. Hu, Michael T. Niemier, Zhiyi Xie, Hong Zhu. In Proceedings of 10th Workshop on Algorithms and Data Structures(WADS 2007) Halifax, Canada, August 15-17, 2007. [Paper]

  11. On approximating the maximum simple sharing problem. Danny Z. Chen, R. Fleischer, Jian Li, Zhiyi Xie, and Hong Zhu. In Proceedings of the 17th International Symposium on Algorithms and Computation (ISAAC 2006), Kolkata, December 18-20, 2006. [slides][Paper]

  12. Non-metric multicommodity and multilevel facility location. R. Fleischer, Jian Li, Shijun Tian, and Hong Zhu. In Proceedings of the 2nd International Symposium on Algorithmic Aspects in Information and Management (AAIM 2006), Hong Kong, Jun 20-22, 2006. Springer LNCS 4041, 2006, pp. 138-148. [Paper]

  13. Approximating spanning trees with inner nodes cost. R. Fleischer, Qi Ge, Jian Li, Shijun Tian, and Haitao Wang. In Proceedings of the 6th International Conference on Parallel and Distributed Computing (PDCAT 2005), Dalian, China, Dec 5-8, 2005, pp. 600-664. [full version]

    • We study the problem of finding a minimum spanning tree with both edge weights and inner node (non-leaf node) weights.


Computational Geometry

  1. K-Means Clustering with Distributed Dimension. Hu Ding, Yu Liu, Lingxiao Huang, Jian Li. The 33rd International Conference on Machine Learning (ICML 2016).

  2. Almost All Even Yao-Yao Graphs Are Spanners. Jian Li, Wei Zhan. The 24rd Annual European Symposium on Algorithms (ESA 2016). [ArXiv]

  3. A PTAS for the Weighted Unit Disk Cover Problem, Jian Li, Yifei Jin. The 42nd International Colloquium on Automata, Languages, and Programming (ICALP 2015) , to appear [ArXiv]

  4. Approximating the Expected Values for Combinatorial Optimization Problems over Stochastic Points. Lingxiao Huang, Jian Li. The 42nd International Colloquium on Automata, Languages, and Programming (ICALP 2015), to appear. [ArXiv]  subsumes a previous draft: Minimum Spanning Trees, Perfect Matchings and Cycle Covers Over Stochastic Points in Metric Spaces.

  5. Approximation Algorithms for the Connected Sensor Cover Problem. Lingxiao Huang, Jian Li, Qicai Shi. The 21st Annual International Computing and Combinatorics Conference (COCOON'15) , to appear [paper]

  6. Linear Time Approximation Schemes for Geometric Maximum Coverage, Jian Li, Haitao Wang, Bowei Zhang, Ningye Zhang. The 21st Annual International Computing and Combinatorics Conference (COCOON'15) , to appear [paper]

  7. On the Inequalities of Projected Volumes and the Constructible Region, Zihan Tan, Liwei Zeng, Jian Li. [ArXiv]

    • We study the following geometry problem: given a 2^n−1 dimensional vector ¦Ð (indexed by all subsets of [n]) , is there an object T in R^n such that vol(T_S)=¦Ð_S , for all subsets S\subset [n] , where T_S is the projection of T to the subspace spanned by the axes in S? 

  8. epsilon-Kernel Coresets for Stochastic Points. Jian Li, Jeff Phillips and Haitao Wang The 24rd Annual European Symposium on Algorithms (ESA 2016). [AxXiv]

    • We consider the standard stochastic geometry model in which the existence or the location of each point may be stochastic. We show there exists constant-size kernel coreset (a kernel can approximate either the expected value or the distribution of the width of the point set in every direction) and we can construct such kernel coresets in nearly linear time. We show its applications to several function extent problems, tracking moving points, and shape fitting problems in the stochastic setting.

  9. Range Queries on Uncertain Data, Jian Li, Haitao Wang, International Symposium on Algorithms and Computation (ISAAC 2014). [paper]

    • We are given a set of stochastic points (with uniform distr) on the real line, we consider the problem of building data structures on P to answer range queries: find the top-k points that lie in the given interval with the highest probability.

  10. Algorithms on Minimizing the Maximum Sensor Movement for Barrier Coverage of a Linear Domain; Danny Z. Chen, Yan Gu, Jian Li and Haitao Wang; In 13th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2012), Helsinki, Finland, 2012 [ArXiv]. Journal version in Discrete & Computational Geometry (DCG), 2013 [Journal doi]

    • Given a set of small intervals, and a target interval I, we would like to move the small intervals to cover the target interval. Our goal is to minimize the maximum movement of any interval. We obtain an O(n^2 log n) algorithm for this problem.

  11. Trinary-Projection Trees for Approximate Nearest Neighbor Search. Jingdong Wang, Naiyan Wang, You Jia; Jian Li, Gang Zeng, Hongbin Zha, Xian-Sheng Hua. The IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI), 2013 [Paper]

  12. Efficient Algorithms for One-Dimensional k-Center Problems. Danny Z. Chen, Jian Li, Haitao Wang. Theoretical Computer Science (TCS), 2015 (accepted) [ArXiv]

    • We consider the problem of finding k centers for n weighted points on a real line. This (weighted) k-center problem was solved in O(n log n) time previously by using Cole's parametric search and other complicated approaches. In this paper, we present an easier O(n log n) time algorithm that avoids the parametric search, and in certain special cases our algorithm solves the problem in O(n) time.

  13. Traversing the machining graph. Danny Z. Chen, R. Fleischer, Jian Li, Haitao Wang, and Hong Zhu. In Proceedings of the 14th Annual International Symposium on Algorithms (ESA 2006), Zurich, Switzerland, Sep 11-13, 2006. Springer LNCS. [full version]


Databases

  1. Human-in-the-loop Outlier Detection. Chengliang Chai, Lei Cao, Guoliang Li, Jian Li, Yuyu Luo, Samuel Madden. 2020 ACM International Conference on Management of Data (SIGMOD 2020). [Paper]

  2. Efficient Algorithms for Crowd-Aided Categorization. Yuanbing Li, Xian Wu, Yifei Jin, Jian Li, Guoliang Li. International Conference on Very Large Data Bases (VLDB 2020). [Paper]

  3. Cleaning Relations using Knowledge Bases. Shuang Hao, Nan Tang, Guoliang Li, Jian Li. The 33th IEEE International Conference on Data Engineering (ICDE 2017). [paper]

  4. k-Regret Minimizing Set: Efficient Algorithms and Hardness. Wei Cao, Jian Li, Haitao Wang, Kangning Wang, Ruosong Wang, Raymond Chi-Wing Wong and Wei Zhan. The 20th International Conference on Database Theory (ICDT 2017), Venice, Italy. (best newcomer award) [paper] [full version]

  5. Cost-Effective Crowdsourced Entity Resolution: A Partial-Order Approach, Chengliang Chai, Guoliang Li, Jian Li, Dong Deng, Jianhua Feng. The annual ACM SIGMOD conference 2016. [ Show Abstract ]

  6. ETCPS: An Effective and Scalable Traffic Condition Prediction System, Dong Wang, Wei Cao, Mengwen Xu and Jian Li The 21st International Conference on Database Systems for Advanced Applications (DASFAA 2016).

  7. Automatic User Identification Method across Heterogeneous Mobility Data Souces. Wei Cao, Zhengwei Wu, Jian Li, Haishan Wu. In International Conference on Data Engineering (ICDE2016).

  8. Efficient Similarity Search and Join on Multi-Attribute Data. Guoliang Li, Jian He, Deng Dong, Jian Li. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2015).

  9. Scalable Column Concept Determination for Web Tables Using Large Knowledge Bases. Dong Deng, Yu Jiang, Guoliang Li, Jian Li, Cong Yu. In the 39th International Conference on Very Large Data Bases (VLDB 2013), Italy, 2013 [Paper][Full version].

  10. DataSynth: Generating Synthetic Data using Declarative Constraints. Arvind Arasu, Raghav Kaushik, and Jian Li. In the 37th International Conference on Very Large Data Bases (VLDB 2011), Seattle, Wasington, 2011. (Demo)

  11. Data Generation using Declarative Constraints. Arvind Arasu, Raghav Kaushik, and Jian Li. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), Athens, Greece, 2011. [Paper]

    • We study the problem of generating synthetic databases having declaratively specified characteristics. This problem is motivated by database system and application testing, data masking, and benchmarking. We argue that a natural, expressive, and declarative mechanism for specifying data characteristics is through cardinality constraints; a cardinality constraint specifies that the output of a query over the generated database have a certain cardinality. While the data generation problem is in tractable in general, we present efficient algorithms that can handle a large and useful class of constraints.

  12. Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases. Bhargav Kanagal, Jian Li, Amol Deshpande. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD 2011), Athens, Greece, 2011. [Paper]

  13. Ranking Continuous Probabilistic Datasets. Jian Li and Amol Deshpande. In the 36th International Conference on Very Large Data Bases (VLDB 2010), Singapore, 2010. [Paper] [slides]

    • We extend our VLDB09 paper to continuous distributions. We give exact algorithm for polynomial PDFs and approximation algorithms for arbitrary continuous distributions using techniques in approximation theory, such as splines, quadratures, with convergence guarantees better than naive Monte Carlo.

  14. A Unified Approach to Ranking in Probabilistic Databases. Jian Li, Barna Saha and Amol Deshpande. In the 35th International Conference on Very Large Data Bases (VLDB 2009), Lyon, France, 2009. (Best Paper Award) [Paper][Slides long short]
    Journal version: A Unified Approach to Ranking in Probabilistic Databases.  Jian Li, Barna Saha and Amol Deshpande. The VLDB Journal, 2011. [Journal Version
    ]

    • We propose two parameterized ranking functions, called PRF and PRFe, for ranking probabilistic datasets.  PRF and PRFe generalize or can approximate many of the previously proposed ranking functions. We present novel generating functions-based algorithms for efficiently ranking large datasets according to these ranking functions.

  15. Consensus Answers for Queries over Probabilistic Databases. Jian Li and Amol Deshpande. In the 28th ACM Symposium on Principles of Database Systems (PODS 2009). Providence, USA, 2009 [Paper][slides]

    • We address the problem of finding a ¡°best¡± deterministic query answer to a query over a probabilistic database. We propose the notion of a consensus world (or a consensus answer) which is a deterministic world (answer) that minimizes the expected distance to the possible worlds (answers). We consider this problem for various types of queries including SPJ queries, Top-k ranking queries, group-by aggregate queries, and clustering. For different distance metrics, we obtain polynomial time optimal or approximation algorithms for computing the consensus answers (or prove NP-hardness).

  16. Minimizing communication cost in distributed multi-query processing. Jian Li, Amol Deshpande and Samir Khuller. In International Conference on Data Engineering (ICDE2009), Shanghai, China, 2009 [Paper][slides]

    • We are also given a set of queries, Q1.....Qm, with the query Qi requiring access to a subset of relations distributed in different nodes of the network. For each query, a query plan is provided, in the form of a rooted tree, which specifies the operations to be performed on the relations and the order in which to perform them. Given this input, our goal is to find a data movement plan that minimizes the total communication cost incurred while executing the queries. We provide efficient exact algorithm when the communication network is a tree and approximation algorithm for more general networks.


PTIME Problems (data structures, computational geometry, efficient algorithms for problems in PTIME)

  1. Improved Algorithms For Structured Sparse Recovery. Lingxiao Huang, Yifei Jin, Jian Li, Haitao Wang. [ArXiv]

  2. k-Regret Minimizing Set: Efficient Algorithms and Hardness. Wei Cao, Jian Li, Haitao Wang, Kangning Wang, Ruosong Wang, Raymond Chi-Wing Wong and Wei Zhan. The 20th International Conference on Database Theory (ICDT 2017), Venice, Italy. (best newcomer award) [paper] [full version]

  3. Balanced Splitting on Weighted Intervals. Wei Cao, Jian Li, Shimin Li, and Haitao Wang. Operation Research Letters, 2015 (accepted).

  4. Range Queries on Uncertain Data, Jian Li, Haitao Wang, International Symposium on Algorithms and Computation (ISAAC 2014). [paper]

    • We are given a set of stochastic points (with uniform distr) on the real line, we consider the problem of building data structures on P to answer range queries: find the top-k points that lie in the given interval with the highest probability.

  5. Algorithms on Minimizing the Maximum Sensor Movement for Barrier Coverage of a Linear Domain; Danny Z. Chen, Yan Gu, Jian Li and Haitao Wang; In 13th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2012), Helsinki, Finland, 2012 [ArXiv] . Journal version in Discrete & Computational Geometry (DCG), 2013 [Journal doi]

    • Given a set of small intervals, and a target interval I, we would like to move the small intervals to cover the target interval. Our goal is to minimize the maximum movement of any interval. We obtain an O(n^2 log n) algorithm for this problem.

  6. Efficient Algorithms for One-Dimensional k-Center Problems. Danny Z. Chen, Jian Li, Haitao Wang. [ArXiv]

    • We consider the problem of finding k centers for n weighted points on a real line. This (weighted) k-center problem was solved in O(n log n) time previously by using Cole's parametric search and other complicated approaches. In this paper, we present an easier O(n log n) time algorithm that avoids the parametric search, and in certain special cases our algorithm solves the problem in O(n) time.

  7. Efficient Algorithms for k-Disjoint Paths Problems on DAGs. Rudolf Fleischer, Qi Ge, Jian Li, Hong Zhu. In Proceedings of the 3nd International Symposium on Algorithmic Aspects in Information and Management (AAIM 2007), Portland, USA, 2007.[Paper]

  8. Traversing the machining graph. Danny Z. Chen, R. Fleischer, Jian Li, Haitao Wang, and Hong Zhu. In Proceedings of the 14th Annual International Symposium on Algorithms (ESA 2006), Zurich, Switzerland, Sep 11-13, 2006. Springer LNCS. [full version]


Game Theory and Computational Economics

  1. Optimal Allocation for Chunked-Reward Advertising. Weihao Kong, Jian Li, Tao Qin, Tie-Yan Liu,  In The 9th Conference on Web and Internet Economics (WINE 2013), Cambridge, MA, USA [ArXiv]

  2. Egalitarian Pairwise Kidney Exchange: Fast Algorithms via Linear Programming and Parametric Flow. Jian Li, Yicheng Liu, Lingxiao Huang, Pingzhong Tang. In the 13th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2014) [paper]

    • We obtain an efficient poly-time algorithm for the pairwise kidney exchange problem proposed by Roth et al. [J. Econ.Th. 05]. Their original algorithm runs in exponential time. We also provide an alternative short proof of the fact that there exists a majorizing vector in a polymatroid (original proof due to Tamir [MOR95]).

  3. The load-distance balancing problem. Edward Bortnikov, Samir Khuller, Jian Li, Yishay Mansour and Seffi Naor. Networks, 2012.[Paper] [doi]

    • We study a problem referred to as the load-distance balancing (LDB) problem, where the objective is assigning a set of clients to a set of given servers. Each client suffers a delay, that is, the sum of the network delay (which is proportional to the distance to its server) and the congestion delay at this server, a nondecreasing function of the number of clients assigned to the server. We address two flavors of LDB¡ªthe first one seeking to minimize the maximum incurred delay, and the second one targeted for minimizing the average delay.  We also provide bounds for the price of anarchy for the game theoretic version of the problem.

  4. An O({logn\over loglogn}) Upper Bound on the Price of Stability for Undirected Shapley Network Design Games. Jian Li. In Information Processing Letter (IPL). 2009. [Paper][slides]

    • We have an edge weighted undirected network G(V, E) and n selfish players where player i wants to choose a low cost path from source vertex si to destination vertex ti . The cost of each edge is equally split among players who pass it. The price of stability is defined as the ratio of the cost of the best Nash equilibrium to that of the optimal solution. We present an O(logn/ log logn) upper bound on price of stability for the single sink case, i.e., ti =t for all i.

  5. Algorithms for core stability, core largeness, exactness, and extendability of flow games. Qizhi Fang, R. Fleischer, Jian Li, and Xiaoxun Sun. In Proceedings of the 13th Annual International Computing and Combinatorics Conference (COCOON 2007), Banff, Canada, July 16-19, 2007.
    Journal version in Frontier of Math in China, vol 5(1), pp.47-63, 2010. [Journal Version
    ]

    • We study core stability and some related properties of flow games defined on simple networks (all edge capacities are equal) from an algorithmic point of view. We first present a sufficient and necessary condition that can be tested efficiently for a simple flow game to have a stable core. We also prove the equivalence of the properties of core largeness, extendability, and exactness of simple flow games and provide an equivalent graph theoretic characterization which allows us to decide these properties in polynomial time.


Networks

  1. Approximation Algorithms for the Connected Sensor Cover Problem. Lingxiao Huang, Jian Li, Qicai Shi.

  2. Lingqing Ai, Xian Wu, Lingxiao Huang, Longbo Huang, Pingzhong Tang, and Jian Li, The Multi-shop Ski Rental Problem,  Proceedings of ACM SIGMETRICS, 2014.  [paper]

    • We consider the multi-shop ski rental problem. This problem generalizes the classic ski rental problem to a multi-shop setting, in which each shop has different prices for renting and purchasing a pair of skis, and a consumer has to make decisions on when and where to buy. We given optimal close-form online competitive strategy from the consumer's perspective and a linear time algorithm for computing the optimal strategy. The problem finds applications in cost management in IaaS cloud and scheduling in distributed computing.

  3. On the Energy Efficiency of Device Discovery in Mobile Opportunistic Networks: A Systematic Approach. Bo Han; Jian Li,  Aravind Srinivasan.   IEEE Transactions on Mobile Computing (TMC), 2014 (accepted) [Paper]

  4. Your Friends Have More Friends Than You Do: Identifying Influential Mobile Users Through Random Walks. Bo Han, Jian Li and Aravind Srinivasan. IEEE/ACM Transactio​ns on Networking (TON), 2013 (accepted) [Paper]

    • we investigate the problem of identifying influential users in mobile social networks. Influential users are individuals with high centrality in their social-contact graphs. Traditional approaches find these users through centralized algorithms. However, the computational complexity of these algorithms is known to be very high, making them unsuitable for large-scale networks. We propose a lightweight and distributed protocol, iWander, to identify influential users through fixed length random-walk sampling. We prove that random-walk sampling with O(log n) steps, where n is the number of nodes in a graph, comes quite close to sampling vertices approximately according to their degrees. The most attractive feature of iWander is its extremely low control-message overhead, which lends itself well to mobile applications.

  5. On Computing Compression Trees for Data Collection in Wireless Sensor Networks. Jian Li, Amol Deshpande and Samir Khuller. In the 29th Conference on Computer Communications (INFOCOM 2010), San Diego, USA, 2010 [Paper] [slides]

  6. Minimizing communication cost in distributed multi-query processing. Jian Li, Amol Deshpande and Samir Khuller. In International Conference on Data Engineering (ICDE2009), Shanghai, China, 2009 [Paper][slides]

    • We are also given a set of queries, Q1.....Qm, with the query Qi requiring access to a subset of relations distributed in different nodes of the network. For each query, a query plan is provided, in the form of a rooted tree, which specifies the operations to be performed on the relations and the order in which to perform them. Given this input, our goal is to find a data movement plan that minimizes the total communication cost incurred while executing the queries. We provide efficient exact algorithm when the communication network is a tree and approximation algorithm for more general networks.


Others

  1. A Pruned Exhaustive Search Algorithm for Nearly Optimal Diversified Result Ranking, Fei Chen, Yiqun Liu, Jian Li, Min Zhang and Shaoping Ma, 23rd International World Wide Web Conference, Seoul, Korea, April 7-11, 2014 (Poster)

  2. More Efficient Algorithms and Analyses for Unequal Letter Cost Prefix-Free Coding. Mordecai Golin, Jian Li. In Proceedings of the 18th International Symposium on Algorithms and Computation (ISAAC 2007), Sendai, Japan, 2007.
    Journal version In  IEEE Transactions on Information Theory, Volume 54, Issue 8, Aug. Page(s):3412 - 3424, 2008 [Journal Version
    ]

    • There is a large literature devoted to the problem of finding an optimal (min-cost) prefix-free code with an unequal letter-cost encoding alphabet of size. While there is no known polynomial time algorithm for solving it optimally, there are many good heuristics that all provide additive errors to optimal. The additive error in these algorithms usually depends linearly upon the largest encoding letter size.
      This paper was motivated by the problem of finding optimal codes when the encoding alphabet is infinite. Because the largest letter cost is infinite, the previous analyses could give infinite error bounds. We provide a new algorithm that works with infinite encoding alphabets. When restricted to the finite alphabet case, our algorithm often provides better error bounds than the best previous ones known.

¡¡

¡¡