Greedification operators for policy optimization: Investigating forward and reverse kl divergences A Chan, H Silva, S Lim, T Kozuno, AR Mahmood, M White Journal of Machine Learning Research 23 (253), 1-79, 2022 | 23 | 2022 |
Actor-Expert: A Framework for using Action-Value Methods in Continuous Action Spaces S Lim, A Joseph, L Le, Y Pan, M White NeurIPS 2018, Deep Reinforcement Learning Workshop, https://arxiv.org/abs …, 2018 | 21* | 2018 |
Maximizing Information Gain in Partially Observable Environments via Prediction Rewards Y Satsangi, S Lim, S Whiteson, F Oliehoek, M White AAMAS 2020, 2020 | 16 | 2020 |
Actor-Expert: A Framework for using Q-learning in Continuous Action Spaces S Lim University of Alberta, 2019 | 14 | 2019 |
Greedy actor-critic: A new conditional cross-entropy method for policy improvement S Neumann, S Lim, A Joseph, Y Pan, A White, M White arXiv preprint arXiv:1810.09103, 2018 | 4 | 2018 |
An Empirical and Conceptual Categorization of Value-based Exploration Methods N Yasui, S Lim, C Linke, A White, M White | 1 | 2019 |
Maximizing Information Gain in Partially Observable Environments via Prediction Rewards S Lim, Y Satsangi, S Whiteson, FA Oliehoek, M White | | 2020 |