Self-consistent trajectory autoencoder: Hierarchical reinforcement learning with trajectory...

S Pateria, B Subagdja, A Tan, C Quek - ACM Computing Surveys (CSUR …, 2021 - dl.acm.org

Hierarchical Reinforcement Learning (HRL) enables autonomous decomposition of
challenging long-horizon decision-making tasks into simpler subtasks. During the past …

被引用次数：359 相关文章所有 5 个版本

[PDF] mdpi.com

An information-theoretic perspective on intrinsic motivation in reinforcement learning: A survey

A Aubret, L Matignon, S Hassas - Entropy, 2023 - mdpi.com

The reinforcement learning (RL) research area is very active, with an important number of
new contributions, especially considering the emergent field of deep RL (DRL). However, a …

被引用次数：32 相关文章所有 10 个版本

[PDF] neurips.cc

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc

Reinforcement learning (RL) is typically viewed as the problem of estimating single-step
policies (for model-free RL) or single-step models (for model-based RL), leveraging the …

被引用次数：681 相关文章所有 9 个版本

[PDF] neurips.cc

Deep hierarchical planning from pixels

D Hafner, KH Lee, I Fischer… - Advances in Neural …, 2022 - proceedings.neurips.cc

Intelligent agents need to select long sequences of actions to solve complex tasks. While
humans easily break down tasks into subgoals and reach them through millions of muscle …

被引用次数：73 相关文章所有 7 个版本

[PDF] arxiv.org

Varibad: A very good method for bayes-adaptive deep rl via meta-learning

L Zintgraf, K Shiarlis, M Igl, S Schulze, Y Gal… - arXiv preprint arXiv …, 2019 - arxiv.org

Trading off exploration and exploitation in an unknown environment is key to maximising
expected return during learning. A Bayes-optimal policy, which does so optimally, conditions …

被引用次数：272 相关文章所有 6 个版本

[PDF] arxiv.org

Efficient exploration via state marginal matching

L Lee, B Eysenbach, E Parisotto, E Xing… - arXiv preprint arXiv …, 2019 - arxiv.org

Exploration is critical to a reinforcement learning agent's performance in its given
environment. Prior exploration methods are often based on using heuristic auxiliary …

被引用次数：269 相关文章所有 3 个版本

[PDF] nsf.gov

Augmenting reinforcement learning with behavior primitives for diverse manipulation tasks

S Nasiriany, H Liu, Y Zhu - 2022 International Conference on …, 2022 - ieeexplore.ieee.org

Realistic manipulation tasks require a robot to interact with an environment with a prolonged
sequence of motor actions. While deep reinforcement learning methods have recently …

被引用次数：101 相关文章所有 6 个版本

[PDF] ieee.org

Robot motion planning in learned latent spaces

B Ichter, M Pavone - IEEE Robotics and Automation Letters, 2019 - ieeexplore.ieee.org

This letter presents latent sampling-based motion planning (L-SBMP), a methodology
toward computing motion plans for complex robotic systems by learning a plannable latent …

被引用次数：196 相关文章所有 6 个版本

[PDF] arxiv.org

Controllability-aware unsupervised skill discovery

S Park, K Lee, Y Lee, P Abbeel - arXiv preprint arXiv:2302.05103, 2023 - arxiv.org

One of the key capabilities of intelligent agents is the ability to discover useful skills without
external supervision. However, the current unsupervised skill discovery methods are often …

被引用次数：35 相关文章所有 7 个版本

[PDF] neurips.cc

On reward-free reinforcement learning with linear function approximation

R Wang, SS Du, L Yang… - Advances in neural …, 2020 - proceedings.neurips.cc

Reward-free reinforcement learning (RL) is a framework which is suitable for both the batch
RL setting and the setting where there are many reward functions of interest. During the …

被引用次数：122 相关文章所有 6 个版本

高级搜索

QQ 群