Deep Reinforcement Learning

Deep reinforcement learning (DRL) has achieved groundbreaking success for many decision making problems, both in discrete and continuous domains, including robotics, game playing, and many others. In reinforcement learning, an agent does not have access to the transition dynamics and reward functions of the environment. Therefore, it learns an optimal policy by interacting with the unknown environment, which can suffer from high sample complexity and limited applications in high-dimensional real-world problems. Therefore, it is important for the agent to find efficient and practical ways for learning.

In this research thrust, we are interested in (i) designing efficient, practical DRL algorithms in specific domains, and (ii) improving the sample efficiency of RL algorithms for general tasks.

alt text 

Recent publications:

  1. L. Pan, L. Huang, T. Ma, H. Xu, “Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification,” Proceedings of the 39th International Conference on Machine Learning (ICML), July 2022.

  2. L. Pan, T. Rshid, B. Peng, L. Huang, S. Whiteson, “Regularized Softmax Deep Multi-Agent Q-Learning,” Proceedings of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), December 2021.

  3. L. Pan, Q. Cai and L. Huang, ‘‘Softmax Deep Double Deterministic Policy Gradients,’’ Proceedings of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), December 2020.

  4. L. Pan, Q. Cai, Q. Meng, W. Chen, L. Huang, “Reinforcement Learning with Dynamic Boltzmann Softmax Updates,” Proceedings of International Joint Conference on Artificial Intelligence –Pacific Rim International Conference on Artificial Intelligence (IJCAI), July 2020.

  5. L. Pan, Q. Cai and L. Huang, ‘‘Multi-Path Policy Optimization,’’ Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), May 2020. (Selected for fast-track publication at JAAMAS, top 5%)

  6. L. Pan, Q. Cai, Z. Fang, P. Tang, and L. Huang, “A Deep Reinforcement Learning Framework for Rebalancing Dockless Bike Sharing Systems,” Proceedings of the Thirty-Third AAAI Conference on Artificial Intelligence (AAAI), Jan 2019. arXiv:1802.04592

Online Learning and RL Theory

Existing online learning framework focuses on learning parameters via a series of feedback in an ideal situation. However, real-world applications can be much more complicated, and there are still many challenges in making the model close to reality. For example, in some online systems, instead of the system itself, users make decisions on what actions to choose. In this case, it remains unsolved how to influence users, so that the system learns from the feedback efficiently. As another example, in many cases, action feedback can have delay and maybe convoluted with prior action feedback. How to optimally learn and control remains largely open.

In this thrust, we are interested in designing efficient algorithms for the online learning problem under real-world constraints, and extending existing online learning results.

alt text 

Recent publications:

  1. P. Hu, Y. Chen, L. Huang, “Nearly Minimax Optimal Reinforcement Learning with Linear Function Approximation,” Proceedings of the 39th International Conference on Machine Learning (ICML), July 2022.

  2. J. Huang, Y. Dai, L. Huang, “Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits,” Proceedings of the 39th International Conference on Machine Learning (ICML), July 2022.

  3. T. Jin, L. Huang, H. Luo “The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition,” Proceedings of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), December 2021. (Oral, top 1%)

  4. Y. Du, S. Wang, L. Huang, “A One-Size-Fits-All Solution to Conservative Bandit Problems,” Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), February 2021.

  5. S. Wang, H. Wang, L. Huang, “Adaptive Algorithms for Multi-armed Bandit with Composite and Anonymous Feedback,” Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI), February 2021.

  6. W. Chen, Y. Du, L. Huang, H. Zhao, “Combinatorial Pure Exploration for Dueling Bandit,” Proceedings of the 2020 International Conference on Machine Learning (ICML), July 2020.

  7. S. Wang and L. Huang, “Multi-armed Bandits with Compensation,” Proceedings of the Thirty-second Conference on Neural Information Processing Systems (NIPS), December 2018.

Learning Theory

The rapid development of AI techniques have led to the invention of many powerful methods, which significantly outperform existing techniques. Yet, many of the phenomena observed in empirical studies are not fully understood. For instance, is multi-modality learning always better than single? Will the training algorithms affect model performance?

In this thrust, we aim to answer these questions. We focus on building rigorous theoretical understanding for popular AI techniques, focusing on multi-modality learning and meta-learning.

alt text 

Recent publications:

  1. Y. Huang, J. Lin, C. Zhou, H. Yang, L. Huang, “Modality Competition: What Makes Joint Training of Multi-modal Network Fail in Deep Learning? (Provably),” Proceedings of the 39th International Conference on Machine Learning (ICML), July 2022.

  2. Y. Huang, C. Du, Z. Xue, X. Chen, H. Zhao, L. Huang, ‘‘What Makes Multimodal Learning Better than Single (Provably),’’ Proceedings of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), December 2021.

Learning-augmented Network Optimization

Existing network optimization results have mostly been focusing on designing algorithms either based on full a-priori statistical knowledge of the system, or based on stochastic approximation for systems with zero knowledge beforehand. These two scenarios, though being very general, do not explicitly capture the role of information learning in control and do not reap the potential benefits of it. This ignorance often leads to a mismatch between algorithms in the literature and practical control schemes, where system information (data) is often constantly collected and incorporated into operations.

In this research thrust, we are interested in (i) quantifying fundamental benefits and limits of learning in network optimization, and (ii) designing simple and practical algorithms for achieving the full benefits.

alt text 

Recent publications:

  1. P. Hu, L. Pan, Y.Chen, Z. Fang, L.Huang, “Effective Multi-User Delay-Constrained Scheduling with Deep Recurrent Reinforcement Learning,” Proceedings of the 23rd ACM International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing (MobiHoc), October 2022.

  2. J. Huang and L. Huang, “Robust Wireless Scheduling under Arbitrary Channel Dynamics and Feedback Delay,” Proceedings of International Teletraffic Congress (ITC), August 2021. (Invited Paper)

  3. Y. Yu, J. Wu and L. Huang, ‘‘Double Quantization for Communication-Efficient Distributed Optimization,’’ Proceedings of the Thirty-third Conference on Neural Information Processing Systems (NeurIPS), December 2019.

  4. L.Huang, M. Chen, and Y. Liu, ‘‘Learning-aided Stochastic Network Optimization with Imperfect State Prediction,’’ Proceedings of the 18th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), July 2017.

  5. L. Huang, ‘‘The Value-of-Information in Matching with Queues,’’ Proceedings of the 16th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), June 2015.

  6. L. Huang, ‘‘Receding Learning-aided Control in Stochastic Networks,’’ IFIP Performance (Performance), Oct 2015.

  7. L. Huang, X. Liu, and X. Hao, ‘‘The Power of Online Learning in Stochastic Network Optimization,’’ Proceedings of ACM Sigmetrics (Sigmetrics full paper), June 2014.

  8. L. Huang, S. Zhang, M. Chen, and X. Liu ‘‘When Backpressure meets Predictive Scheduling,’’ Proceedings of 15th ACM International Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), August 2014. (Best Paper Candidate)