Deep Reinforcement Learning

Deep reinforcement learning (DRL) has achieved groundbreaking success for many decision making problems, both in discrete and continuous domains, including robotics, game playing, and many others. In reinforcement learning, an agent does not have access to the transition dynamics and reward functions of the environment. Therefore, it learns an optimal policy by interacting with the unknown environment, which can suffer from high sample complexity and limited applications in high-dimensional real-world problems. Therefore, it is important for the agent to find efficient and practical ways for learning.

In this research thrust, we are interested in (i) designing efficient, practical DRL algorithms in specific domains, and (ii) improving the sample efficiency of RL algorithms for general tasks.

Recent publications:

L. Pan, Q. Cai and L. Huang, ‘‘Softmax Deep Double Deterministic Policy Gradients,’’ Proceedings of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), December 2020.
L. Pan, Q. Cai, Q. Meng, W. Chen, L. Huang, “Reinforcement Learning with Dynamic Boltzmann Softmax Updates,” Proceedings of International Joint Conference on Artificial Intelligence –Pacific Rim International Conference on Artificial Intelligence (IJCAI), July 2020.
L. Pan, Q. Cai and L. Huang, ‘‘Multi-Path Policy Optimization,’’ Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2020. [Selected for fast-track publication at JAAMAS]
L. Pan, Q. Cai, Z. Fang, P. Tang, and L. Huang, “A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems,” Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), Jan 2019. arXiv:1802.04592

Online Learning

Existing online learning framework focuses on learning parameters via a series of feedback in an ideal situation. However, real-world applications can be much more complicated, and there are still many challenges in making the model close to reality. For example, in some online systems, instead of the system itself, users make decisions on what actions to choose. In this case, it remains unsolved how to influence users, so that the system learns from the feedback efficiently. As another example, in many cases, action feedback can have delay and maybe convoluted with prior action feedback. How to optimally learn and control remains largely open.

In this thrust, we are interested in designing efficient algorithms for the online learning problem under real-world constraints, and extending existing online learning results.

Recent publications:

Y. Du, S. Wang, L. Huang, “A One-Size-Fits-All Solution to Conservative Bandit Problems,” Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), February 2021.
S. Wang, H. Wang, L. Huang, “Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback,” Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), February 2021.
W. Chen, Y. Du, L. Huang, H. Zhao, “Combinatorial Pure Exploration for Dueling Bandit,” Proceedings of the 2020 International Conference on Machine Learning (ICML), July 2020.
Y. Du, S. Wang and L. Huang, ‘‘Dueling Bandits: From Two-dueling to Multi-dueling,’'Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2020.
S. Wang and L. Huang, “Multi-armed Bandits with Compensation,” Proceedings of the Thirty-second Conference on Neural Information Processing Systems (NIPS), December 2018.

Stochastic Optimization and Machine Learning

Efficient algorithms are key to machine learning and data science. It is critical to design algorithms that have probable performance guarantees and fast convergence. Moreover, in many practical settings, it is also important to design algorithms that allow distributed implementation and are robust to communication errors or delay.

In this thrust, we are interested in designing efficient distributed algorithms for both convex and non-convex optimization problems.

Recent publications:

Y. Yu, J. Wu and L. Huang, ‘‘Double Quantization for Communication-Efficient Distributed Optimization,’’ Proceedings of the Thirty-third Conference on Neural Information Processing Systems (NeurIPS), December 2019.
Y. Yu and L.Huang, ‘‘Fast Stochastic Variance Reduced ADMM for Stochastic Composition Optimization,’’ Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI) (full paper), August 2017.

Learning-aided Stochastic Network Optimization

Existing network optimization results have mostly been focusing on designing algorithms either based on full a-priori statistical knowledge of the system, or based on stochastic approximation for systems with zero knowledge beforehand. These two scenarios, though being very general, do not explicitly capture the role of information learning in control and do not reap the potential benefits of it. This ignorance often leads to a mismatch between algorithms in the literature and practical control schemes, where system information (data) is often constantly collected and incorporated into operations.

In this research thrust, we are interested in (i) quantifying fundamental benefits and limits of learning in network optimization, and (ii) designing simple and practical algorithms for achieving the full benefits.

Recent publications:

L.Huang, M. Chen, and Y. Liu, ‘‘Learning-aided Stochastic Network Optimization with Imperfect State Prediction,’’ Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), July 2017.
L. Huang, ‘‘The Value-of-Information in Matching with Queues,’’ Proceedings of the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), June 2015.
L. Huang, ‘‘Receding Learning-aided Control in Stochastic Networks,’’ IFIP Performance (Performance), Oct 2015.
L. Huang, X. Liu, and X. Hao, ‘‘The Power of Online Learning in Stochastic Network Optimization,’’ Proceedings of ACM Sigmetrics (Sigmetrics full paper), June 2014.

Predictive Control in Information Systems

The rapid development of machine learning and user behavior study have made it possible to learn and predict user behavior, e.g., mobility patterns, user preferences and software resource demand. Various predictive control schemes have also been implemented in practice to significantly improve user experience. Despite such success, there has been limited theoretical understanding about how prediction fundamentally impacts system performance. Moreover, it is not clear how prediction can be efficiently incorporated into control algorithm design.

In this thrust, we are interested in establishing a general framework for studying the impact of prediction and predictive algorithm design, with the objective of further improving the performance of causal algorithms.

Recent publications:

K. Chen and L. Huang, “Timely-Throughput Optimal Scheduling with Prediction,” IEEE International Conference on Computer Communications (INFOCOM), April 2018.
L.Huang, M. Chen, and Y. Liu, ‘‘Learning-aided Stochastic Network Optimization with Imperfect State Prediction,’’ Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), July 2017.
S. Zhang, L. Huang, M. Chen, and X. Liu, ‘‘Proactive Serving Decreases User Delay Exponentially: The Light-tailed Service Time Case,’’ IEEE/ACM Transactions on Networking (TON), vol. 25, issue 2, 708-723, April 2017.
L. Huang, “System Intelligence: Model, Bounds and Algorithms,” Proceedings of the 17th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), July 2016.
L. Huang, S. Zhang, M. Chen, and X. Liu ‘‘When Backpressure meets Predictive Scheduling,’’ Proceedings of 15th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), August 2014. (Best Paper Candidate)

Sharing Economy: Welfare and Revenue

Sharing economy has emerged as an enabling method for efficiently utilizing social resources that will otherwise have low-utilization. However, the growth of the sharing economy is driven mainly by sharing platforms, whose objectives might not be exactly aligned with social welfare. Thus, many interesting and important questions remain unclear. For instance, how much welfare loss may occur due to the mis-alignment of platform objective, how do prices and subsidies impact system performance, and how do we design optimal loyalty programs.

In this research direction, we are interested in fundamental questions regarding social welfare, platform management, and incentive mechanisms.

Recent publications:

Z. Fang, L. Huang, and A. Wierman, “Loyalty Programs in the Sharing Economy: Optimality and Competition,” Proceedings of the 19th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), June 2018.
Z. Fang, L.Huang, and A. Wierman, ‘‘Prices and Subsidies in the Sharing Economy,’’ Proceedings of World Wide Web (WWW) (full paper), April 2017. [ArXiv Technical Report, arXiv:1604.01627]