Let offline rl flow: Training conservative agents in the latent space of normalizing flows

A Nikulin, V Kurenkov, D Tarasov… - … on Machine Learning, 2023 - proceedings.mlr.press

Despite the success of Random Network Distillation (RND) in various domains, it was shown
as not discriminative enough to be used as an uncertainty estimator for penalizing out-of …

被引用次数：25 相关文章所有 6 个版本

[PDF] neurips.cc

Revisiting the minimalist approach to offline reinforcement learning

D Tarasov, V Kurenkov, A Nikulin… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recent years have witnessed significant advancements in offline reinforcement learning
(RL), resulting in the development of numerous algorithms with varying degrees of …

被引用次数：24 相关文章所有 6 个版本

[PDF] arxiv.org

Deep generative models for offline policy learning: Tutorial, survey, and perspectives on future directions

J Chen, B Ganguly, Y Xu, Y Mei, T Lan… - arXiv preprint arXiv …, 2024 - arxiv.org

Deep generative models (DGMs) have demonstrated great success across various domains,
particularly in generating texts, images, and videos using models trained from offline data …

被引用次数：10 相关文章所有 3 个版本

[PDF] neurips.cc

Katakomba: tools and benchmarks for data-driven NetHack

V Kurenkov, A Nikulin, D Tarasov… - Advances in Neural …, 2024 - proceedings.neurips.cc

NetHack is known as the frontier of reinforcement learning research where learning-based
methods still need to catch up to rule-based solutions. One of the promising directions for a …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Distilling LLMs' Decomposition Abilities into Compact Language Models

D Tarasov, K Shridhar - arXiv preprint arXiv:2402.01812, 2024 - arxiv.org

Large Language Models (LLMs) have demonstrated proficiency in their reasoning abilities,
yet their large size presents scalability challenges and limits any further customization. In …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

The role of deep learning regularizations on actors in offline rl

D Tarasov, A Surina, C Gulcehre - arXiv preprint arXiv:2409.07606, 2024 - arxiv.org

Deep learning regularization techniques, such as dropout, layer normalization, or weight
decay, are widely adopted in the construction of modern artificial neural networks, often …

相关文章所有 2 个版本

Judgmentally adjusted Q-values based on Q-ensemble for offline reinforcement learning

W Liu, S Xiang, T Zhang, Y Han, X Guo… - Neural Computing and …, 2024 - Springer

Recent advancements in offline reinforcement learning (offline RL) have leveraged the Q-
ensemble approach to derive optimal policies from static datasets collected in the past. By …

被引用次数：1 相关文章

基于表征学习的离线强化学习方法研究综述

王雪松，王荣荣，程玉虎 - 自动化学报, 2024 - aas.net.cn

强化学习通过智能体与环境在线交互来学习最优策略, 近年来已成为解决复杂环境下感知决策
问题的重要手段. 然而, 在线收集数据的方式可能会引发安全, 时间或成本等问题 …

相关文章所有 2 个版本

[PDF] arxiv.org

Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning

M Wang, Y Jin, G Montana - arXiv preprint arXiv:2412.03258, 2024 - arxiv.org

Offline reinforcement learning (RL) seeks to learn optimal policies from static datasets
without interacting with the environment. A common challenge is handling multi-modal …

相关文章所有 2 个版本

[PDF] arxiv.org

An approach to improve agent learning via guaranteeing goal reaching in all episodes

P Osinenko, G Yaremenko, G Malaniya… - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement learning is commonly concerned with problems of maximizing accumulated
rewards in Markov decision processes. Oftentimes, a certain goal state or a subset of the …

相关文章所有 2 个版本

高级搜索

QQ 群