Anti-exploration by random network distillation

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press
Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

Understanding, predicting and better resolving Q-value divergence in offline-RL

Y Yue, R Lu, B Kang, S Song… - Advances in Neural …, 2024 - proceedings.neurips.cc
The divergence of the Q-value estimation has been a prominent issue offline reinforcement
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …

XLand-minigrid: Scalable meta-reinforcement learning environments in JAX

A Nikulin, V Kurenkov, I Zisman, A Agarkov… - arXiv preprint arXiv …, 2023 - arxiv.org
We present XLand-MiniGrid, a suite of tools and grid-world environments for meta-
reinforcement learning research inspired by the diversity and depth of XLand and the …

: On-Device Real-Time Deep Reinforcement Learning for Autonomous Robotics

Z Li, A Samanta, Y Li, A Soltoggio… - 2023 IEEE Real-Time …, 2023 - ieeexplore.ieee.org
Autonomous robotic systems, like autonomous vehicles and robotic search and rescue,
require efficient on-device training for continuous adaptation of Deep Reinforcement …

NetworkGym: Reinforcement Learning Environments for Multi-Access Traffic Management in Network Simulation

M Haider, M Yin, M Zhang, A Gupta, J Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Mobile devices such as smartphones, laptops, and tablets can often connect to multiple
access networks (eg, Wi-Fi, LTE, and 5G) simultaneously. Recent advancements facilitate …

Data-driven hierarchical learning approach for multi-point servo control of Pan–Tilt–Zoom cameras

HT Wang, XS Zhai, T Wen, ZD Yin, Y Yang - Engineering Applications of …, 2024 - Elsevier
Abstract Pan–Tilt–Zoom (PTZ) cameras, with their significant features of free rotation and
zoom, are widely used in areas such as border security, ecological conservation, emergency …

The role of deep learning regularizations on actors in offline rl

D Tarasov, A Surina, C Gulcehre - arXiv preprint arXiv:2409.07606, 2024 - arxiv.org
Deep learning regularization techniques, such as dropout, layer normalization, or weight
decay, are widely adopted in the construction of modern artificial neural networks, often …

Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning

W Liu, S Xiang, T Zhang, Y Han, X Guo… - Neural Computing and …, 2024 - Springer
Recent advancements in offline reinforcement learning (offline RL) have leveraged the Q-
ensemble approach to derive optimal policies from static datasets collected in the past. By …

Dataset Clustering for Improved Offline Policy Learning

Q Wang, Y Deng, FR Sanchez, K Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Offline policy learning aims to discover decision-making policies from previously-collected
datasets without additional online interactions with the environment. As the training dataset …