Anti-exploration by random network distillation

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press
Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc
Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

Deep generative models for offline policy learning: Tutorial, survey, and perspectives on future directions

J Chen, B Ganguly, Y Xu, Y Mei, T Lan… - arXiv preprint arXiv …, 2024 - arxiv.org
Deep generative models (DGMs) have demonstrated great success across various domains,
particularly in generating texts, images, and videos using models trained from offline data …

Katakomba: tools and benchmarks for data-driven NetHack

V Kurenkov, A Nikulin, D Tarasov… - Advances in Neural …, 2024 - proceedings.neurips.cc
NetHack is known as the frontier of reinforcement learning research where learning-based
methods still need to catch up to rule-based solutions. One of the promising directions for a …

Distilling LLMs' Decomposition Abilities into Compact Language Models

D Tarasov, K Shridhar - arXiv preprint arXiv:2402.01812, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities,
yet their large size presents scalability challenges and limits any further customization. In …

The role of deep learning regularizations on actors in offline rl

D Tarasov, A Surina, C Gulcehre - arXiv preprint arXiv:2409.07606, 2024 - arxiv.org
Deep learning regularization techniques, such as dropout, layer normalization, or weight
decay, are widely adopted in the construction of modern artificial neural networks, often …

Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning

W Liu, S Xiang, T Zhang, Y Han, X Guo… - Neural Computing and …, 2024 - Springer
Recent advancements in offline reinforcement learning (offline RL) have leveraged the Q-
ensemble approach to derive optimal policies from static datasets collected in the past. By …

基于表征学习的离线强化学习方法研究综述

王雪松, 王荣荣, 程玉虎 - 自动化学报, 2024 - aas.net.cn
强化学习通过智能体与环境在线交互来学习最优策略, 近年来已成为解决复杂环境下感知决策
问题的重要手段. 然而, 在线收集数据的方式可能会引发安全, 时间或成本等问题 …

Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning

M Wang, Y Jin, G Montana - arXiv preprint arXiv:2412.03258, 2024 - arxiv.org
Offline reinforcement learning (RL) seeks to learn optimal policies from static datasets
without interacting with the environment. A common challenge is handling multi-modal …

An approach to improve agent learning via guaranteeing goal reaching in all episodes

P Osinenko, G Yaremenko, G Malaniya… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning is commonly concerned with problems of maximizing accumulated
rewards in Markov decision processes. Oftentimes, a certain goal state or a subset of the …