Demodice: Offline imitation learning with supplementary imperfect demonstrations

M Zare, PM Kebria, A Khosravi… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

In recent years, the development of robotics and artificial intelligence (AI) systems has been
nothing short of remarkable. As these systems continue to evolve, they are being utilized in …

被引用次数：62 相关文章所有 2 个版本

[PDF] neurips.cc

Ceil: Generalized contextual imitation learning

J Liu, L He, Y Kang, Z Zhuang… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we present ContExtual Imitation Learning (CEIL), a general and broadly
applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight …

被引用次数：16 相关文章所有 5 个版本

[PDF] mlr.press

Discriminator-weighted offline imitation learning from suboptimal demonstrations

H Xu, X Zhan, H Yin, H Qin - International Conference on …, 2022 - proceedings.mlr.press

We study the problem of offline Imitation Learning (IL) where an agent aims to learn an
optimal expert behavior policy without additional online environment interactions. Instead …

被引用次数：79 相关文章所有 10 个版本

[PDF] arxiv.org

Benchmarks and algorithms for offline preference-based reward learning

D Shin, AD Dragan, DS Brown - arXiv preprint arXiv:2301.01392, 2023 - arxiv.org

Learning a reward function from human preferences is challenging as it typically requires
having a high-fidelity simulator or using expensive and potentially unsafe actual physical …

被引用次数：59 相关文章所有 4 个版本

[PDF] neurips.cc

Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets

ZW Hong, A Kumar, S Karnik… - Advances in …, 2023 - proceedings.neurips.cc

Offline reinforcement learning (RL) enables learning a decision-making policy without
interaction with the environment. This makes it particularly beneficial in situations where …

被引用次数：14 相关文章所有 6 个版本

[PDF] neurips.cc

Imitation learning from imperfection: Theoretical justifications and algorithms

Z Li, T Xu, Z Qin, Y Yu, ZQ Luo - Advances in Neural …, 2024 - proceedings.neurips.cc

Imitation learning (IL) algorithms excel in acquiring high-quality policies from expert data for
sequential decision-making tasks. But, their effectiveness is hampered when faced with …

被引用次数：9 相关文章所有 3 个版本

[PDF] neurips.cc

Offline Goal-Conditioned Reinforcement Learning via -Advantage Regression

JY Ma, J Yan, D Jayaraman… - Advances in neural …, 2022 - proceedings.neurips.cc

Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill
learning in the form of reaching diverse goals from purely offline datasets. We propose …

被引用次数：29 相关文章所有 6 个版本

[PDF] arxiv.org

State-of-the-art in robot learning for multi-robot collaboration: A comprehensive survey

B Wu, CS Suh - arXiv preprint arXiv:2408.11822, 2024 - arxiv.org

With the continuous breakthroughs in core technology, the dawn of large-scale integration of
robotic systems into daily human life is on the horizon. Multi-robot systems (MRS) built on …

被引用次数：3 相关文章所有 2 个版本

[PDF] neurips.cc

Survival instinct in offline reinforcement learning

A Li, D Misra, A Kolobov… - Advances in neural …, 2024 - proceedings.neurips.cc

We present a novel observation about the behavior of offline reinforcement learning (RL)
algorithms: on many benchmark datasets, offline RL can produce well-performing and safe …

被引用次数：19 相关文章所有 5 个版本

[PDF] mlr.press

Mahalo: Unifying offline reinforcement learning and imitation learning from observations

A Li, B Boots, CA Cheng - International Conference on …, 2023 - proceedings.mlr.press

We study a new paradigm for sequential decision making, called offline policy learning from
observations (PLfO). Offline PLfO aims to learn policies using datasets with substandard …

被引用次数：18 相关文章所有 6 个版本

高级搜索

QQ 群