K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which has registered tremendous success in solving various sequential decision-making problems …
Abstract Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent …
The advances in reinforcement learning have recorded sublime success in various domains. Although the multi-agent domain has been overshadowed by its single-agent counterpart …
Large sequence models (SM) such as GPT series and BERT have displayed outstanding performance and generalization capabilities in natural language process, vision and …
We explore value-based multi-agent reinforcement learning (MARL) in the popular paradigm of centralized training with decentralized execution (CTDE). CTDE has an …
J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As AI systems grow more capable, the potential large-scale risks associated with misaligned AI …
Most recently developed approaches to cooperative multi-agent reinforcement learning in the\emph {centralized training with decentralized execution} setting involve estimating a …
Y Yang, J Wang - arXiv preprint arXiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
Y Bai, H Zhao, X Zhang, Z Chang… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
Unmanned aerial vehicle (UAV)-based wireless networks have received increasing research interest in recent years and are gradually being utilized in various aspects of our …