Efficient Average Reward Reinforcement Learning Using Constant Shifting Values S Yang, Y Gao, B An, H Wang, X Chen Proceedings of the AAAI Conference on Artificial Intelligence 30 (1), 2016 | 31 | 2016 |
An Optimal Algorithm for the Stochastic Bandits While Knowing the Near-Optimal Mean Reward S Yang, Y Gao IEEE Transactions on Neural Networks and Learning Systems 32 (5), 2285-2291, 2021 | 12 | 2021 |
A Contextual Bandit Approach to Personalized Online Recommendation via Sparse Interactions C Zhang, H Wang, S Yang, Y Gao Advances in Knowledge Discovery and Data Mining: 23rd Pacific-Asia …, 2019 | 11 | 2019 |
Contextual Bandits With Hidden Features to Online Recommendation via Sparse Interactions S Yang, H Wang, C Zhang, Y Gao IEEE Intelligent Systems 35 (5), 62-72, 2020 | 10 | 2020 |
New Galois Hulls Of Generalized Reed-Solomon Codes Y Wu, C Li, S Yang Finite Fields and Their Applications 83, 102084, 2022 | 8 | 2022 |
Effective Interpretable Policy Distillation via Critical Experiences Identification X Liu, S Liu, B An, Y Gao, S Yang, W Li IEEE Intelligent Systems, 2023 | 5 | 2023 |
Incremental Nonnegative Matrix Factorization Based on Matrix Sketching and k-means Clustering C Zhang, H Wang, S Yang, Y Gao Intelligent Data Engineering and Automated Learning–IDEAL 2016: 17th …, 2016 | 5 | 2016 |
Learning Explicit Credit Assignment for Cooperative Multi-Agent Reinforcement Learning via Polarization Policy Gradient W Chen, W Li, X Liu, S Yang, Y Gao Proceedings of the AAAI Conference on Artificial Intelligence, 2023 | 4 | 2023 |
WToE: Learning When to Explore in Multi-Agent Reinforcement Learning S Dong, H Mao, S Yang, Z Shengyu, L Wenbin, H Jianye, Y Gao IEEE Transactions on Cybernetics, 2023 | 3 | 2023 |
Modified Retrace for Off-Policy Temporal Difference Learning X Chen, X Ma, Y Li, G Yang, S Yang, Y Gao 39th Conference On Uncertainty in Artificial Intelligence, 2023 | 3 | 2023 |
Keeping Minimal Experience to Achieve Efficient Interpretable Policy Distillation X Liu, S Liu, W Li, S Yang, Y Gao arXiv preprint arXiv:2203.00822, 2022 | 2 | 2022 |
An Optimal Algorithm for the Stochastic Bandits with Knowing Near-optimal Mean Reward S Yang, H Wang, Y Gao, X Chen Proceedings of the 17th International Conference on Autonomous Agents and …, 2018 | 2 | 2018 |
Online Attentive Kernel-based Temporal Difference Learning X Chen, G Yang, S Yang, H Wang, S Dong, Y Gao Knowledge-Based Systems 278, 110902, 2023 | 1 | 2023 |
Modeling rationality: Toward better performance against unknown agents in sequential games Z Ge, S Yang, P Tian, Z Chen, Y Gao IEEE Transactions on Cybernetics 54 (5), 2966-2977, 2022 | 1 | 2022 |
Online attentive kernel-based temporal difference learning G Yang, X Chen, S Yang, H Wang, S Dong, Y Gao arXiv preprint arXiv:2201.09065, 2022 | 1 | 2022 |
Learning Credit Assignment for Cooperative Reinforcement Learning W Chen, W Li, X Liu, S Yang arXiv preprint arXiv:2210.05367, 2022 | 1 | 2022 |
Decentralized Counterfactual Value with Threat Detection for Multi-Agent Reinforcement Learning in Mixed Cooperative and Competitive Environments S Dong, C Li, S Yang, W Li, Y Gao Expert Systems with Applications 257, 125116, 2024 | | 2024 |
Egoism, Utilitarianism and Egalitarianism in Multi-Agent Reinforcement Learning S Dong, C Li, S Yang, B An, W Li, Y Gao Neural Networks 178, 106544, 2024 | | 2024 |
Selective Policy Transfer in Multi-Agent Systems with Sparse Interactions Y Zhuang, Y Liu, S Yang, Y Gao Knowledge-Based Systems 300, 112031, 2024 | | 2024 |
混合博弈问题的求解与应用综述 董绍康, 李超, 杨光, 葛振兴, 曹宏业, 陈武兵, 杨尚东, 陈兴国, 李文斌, ... 软件学报, 1-47, 2024 | | 2024 |