Q-ensemble for offline rl: Don't scale the ensemble, scale the batch size

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press

Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

被引用次数：24 相关文章所有 6 个版本

[PDF] neurips.cc

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

被引用次数：24 相关文章所有 6 个版本

[PDF] neurips.cc

Understanding, predicting and better resolving Q-value divergence in offline-RL

Y Yue, R Lu, B Kang, S Song… - Advances in Neural …, 2024 - proceedings.neurips.cc

The divergence of the Q-value estimation has been a prominent issue offline reinforcement
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

XLand-minigrid: Scalable meta-reinforcement learning environments in JAX

A Nikulin, V Kurenkov, I Zisman, A Agarkov… - arXiv preprint arXiv …, 2023 - arxiv.org

We present XLand-MiniGrid, a suite of tools and grid-world environments for meta-
reinforcement learning research inspired by the diversity and depth of XLand and the …

被引用次数：15 相关文章所有 4 个版本

[PDF] arxiv.org

: On-Device Real-Time Deep Reinforcement Learning for Autonomous Robotics

Z Li, A Samanta, Y Li, A Soltoggio… - 2023 IEEE Real-Time …, 2023 - ieeexplore.ieee.org

Autonomous robotic systems, like autonomous vehicles and robotic search and rescue,
require efficient on-device training for continuous adaptation of Deep Reinforcement …

被引用次数：5 相关文章所有 6 个版本

[PDF] arxiv.org

NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation

M Haider, M Yin, M Zhang, A Gupta, J Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org

Mobile devices such as smartphones, laptops, and tablets can often connect to multiple
access networks (eg, Wi-Fi, LTE, and 5G) simultaneously. Recent advancements facilitate …

被引用次数：1 相关文章所有 4 个版本

被引用次数：1 相关文章

[PDF] arxiv.org

Dataset Clustering for Improved Offline Policy Learning

Q Wang, Y Deng, FR Sanchez, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Offline policy learning aims to discover decision-making policies from previously-collected
datasets without additional online interactions with the environment. As the training dataset …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群