关注
Yi Wan
Yi Wan
Meta
在 meta.com 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Learning and planning in average-reward markov decision processes
Y Wan, A Naik, RS Sutton
International Conference on Machine Learning, 10653-10662, 2021
702021
Average-reward off-policy policy evaluation with function approximation
S Zhang, Y Wan, RS Sutton, S Whiteson
international conference on machine learning, 12578-12588, 2021
372021
Planning with expectation models
Y Wan, Z Abbas, A White, M White, RS Sutton
arXiv preprint arXiv:1904.01191, 2019
292019
Off-policy maximum entropy reinforcement learning: Soft actor-critic with advantage weighted mixture policy (SAC-AWMP)
Z Hou, K Zhang, Y Wan, D Li, C Fu, H Yu
arXiv preprint arXiv:2002.02829, 2020
182020
Towards evaluating adaptivity of model-based reinforcement learning methods
Y Wan, A Rahimi-Kalahroudi, J Rajendran, I Momennejad, S Chandar, ...
International Conference on Machine Learning, 22536-22561, 2022
142022
Average-reward learning and planning with options
Y Wan, A Naik, R Sutton
Advances in Neural Information Processing Systems 34, 22758-22769, 2021
122021
Model-based reinforcement learning with non-linear expectation models and stochastic environments
Y Wan, M Zaheer, M White, RS Sutton
FAIM Workshop on Prediction and Generative Modeling in Reinforcement …, 2018
62018
Toward discovering options that achieve faster planning
Y Wan, RS Sutton
arXiv preprint arXiv:2205.12515, 2022
42022
On convergence of average-reward off-policy control algorithms in weakly communicating MDPs
Y Wan, RS Sutton
arXiv preprint arXiv:2209.15141, 2022
32022
Pearl: A Production-ready Reinforcement Learning Agent
Z Zhu, RS Braz, J Bhandari, D Jiang, Y Wan, Y Efroni, L Wang, R Xu, ...
arXiv preprint arXiv:2312.03814, 2023
22023
Learning and Planning with the Average-Reward Formulation
Y Wan
22023
The Emphatic Approach to Average-Reward Policy Evaluation
J He, Y Wan, AR Mahmood
Deep Reinforcement Learning Workshop NeurIPS 2022, 2022
22022
On Convergence of Average-Reward Q-Learning in Weakly Communicating Markov Decision Processes
Y Wan, H Yu, RS Sutton
arXiv preprint arXiv:2408.16262, 2024
12024
A Note on Stability in Asynchronous Stochastic Approximation without Communication Delays
H Yu, Y Wan, RS Sutton
arXiv preprint arXiv:2312.15091, 2023
12023
Loosely consistent emphatic temporal-difference learning
J He, F Che, Y Wan, AR Mahmood
Uncertainty in Artificial Intelligence, 849-859, 2023
12023
Planning with expectation models for control
K Kudashkina, Y Wan, A Naik, RS Sutton
arXiv preprint arXiv:2104.08543, 2021
12021
Asynchronous Stochastic Approximation and Average-Reward Reinforcement Learning
H Yu, Y Wan, RS Sutton
arXiv preprint arXiv:2409.03915, 2024
2024
Reward Centering
A Naik, Y Wan, M Tomar, RS Sutton
arXiv preprint arXiv:2405.09999, 2024
2024
Discovering Options by Minimizing the Number of Composed Options to Solve Multiple Tasks
Y Wan, RS Sutton
Incremental Policy Gradients for Online Reinforcement Learning Control
K De Asis, A Chan, Y Wan, RS Sutton
系统目前无法执行此操作,请稍后再试。
文章 1–20