Awac: Accelerating online reinforcement learning with offline datasets

M Polese, L Bonati, S D'oro, S Basagni… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org

The Open Radio Access Network (RAN) and its embodiment through the O-RAN Alliance
specifications are poised to revolutionize the telecom ecosystem. O-RAN promotes …

被引用次数：432 相关文章所有 9 个版本

[PDF] ieee.org

A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

With the widespread adoption of deep learning, reinforcement learning (RL) has
experienced a dramatic increase in popularity, scaling to previously intractable problems …

被引用次数：230 相关文章所有 9 个版本

[PDF] arxiv.org

Diffusion policies as an expressive policy class for offline reinforcement learning

Z Wang, JJ Hunt, M Zhou - arXiv preprint arXiv:2208.06193, 2022 - arxiv.org

Offline reinforcement learning (RL), which aims to learn an optimal policy using a previously
collected static dataset, is an important paradigm of RL. Standard RL methods often perform …

被引用次数：210 相关文章所有 6 个版本

[PDF] jair.org Full View

A survey of zero-shot generalisation in deep reinforcement learning

R Kirk, A Zhang, E Grefenstette, T Rocktäschel - Journal of Artificial …, 2023 - jair.org

The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning (RL) aims to
produce RL algorithms whose policies generalise well to novel unseen situations at …

被引用次数：321 相关文章所有 9 个版本

[PDF] neurips.cc

Decision transformer: Reinforcement learning via sequence modeling

L Chen, K Lu, A Rajeswaran, K Lee… - Advances in neural …, 2021 - proceedings.neurips.cc

We introduce a framework that abstracts Reinforcement Learning (RL) as a sequence
modeling problem. This allows us to draw upon the simplicity and scalability of the …

被引用次数：1349 相关文章所有 11 个版本

[PDF] neurips.cc

A minimalist approach to offline reinforcement learning

S Fujimoto, SS Gu - Advances in neural information …, 2021 - proceedings.neurips.cc

Offline reinforcement learning (RL) defines the task of learning from a fixed batch of data.
Due to errors in value estimation from out-of-distribution actions, most offline RL algorithms …

被引用次数：665 相关文章所有 6 个版本

[PDF] neurips.cc

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc

Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

被引用次数：633 相关文章所有 9 个版本

[PDF] arxiv.org

Training diffusion models with reinforcement learning

K Black, M Janner, Y Du, I Kostrikov… - arXiv preprint arXiv …, 2023 - arxiv.org

Diffusion models are a class of flexible generative models trained with an approximation to
the log-likelihood objective. However, most use cases of diffusion models are not concerned …

被引用次数：109 相关文章所有 6 个版本

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y Jin, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

被引用次数：392 相关文章所有 7 个版本

[PDF] neurips.cc

Cal-ql: Calibrated offline rl pre-training for efficient online fine-tuning

M Nakamoto, S Zhai, A Singh… - Advances in …, 2024 - proceedings.neurips.cc

A compelling use case of offline reinforcement learning (RL) is to obtain a policy initialization
from existing datasets followed by fast online fine-tuning with limited interaction. However …

被引用次数：61 相关文章所有 7 个版本

高级搜索

QQ 群