MHER: Model-based hindsight experience replay

R Yang, L Yong, X Ma, H Hu… - … on Machine Learning, 2023 - proceedings.mlr.press

Offline goal-conditioned RL (GCRL) offers a way to train general-purpose agents from fully
offline datasets. In addition to being conservative within the dataset, the generalization …

被引用次数：21 相关文章所有 7 个版本

[PDF] arxiv.org

Rewards-in-context: Multi-objective alignment of foundation models with dynamic preference adjustment

R Yang, X Pan, F Luo, S Qiu, H Zhong, D Yu… - arXiv preprint arXiv …, 2024 - arxiv.org

We consider the problem of multi-objective alignment of foundation models with human
preferences, which is a critical step towards helpful and harmless AI systems. However, it is …

被引用次数：32 相关文章所有 4 个版本

[PDF] neurips.cc

Imitating past successes can be very suboptimal

B Eysenbach, S Udatha… - Advances in Neural …, 2022 - proceedings.neurips.cc

Prior work has proposed a simple strategy for reinforcement learning (RL): label experience
with the outcomes achieved in that experience, and then imitate the relabeled experience …

被引用次数：17 相关文章所有 6 个版本

[PDF] sqz.ac.cn

Efficient bimanual handover and rearrangement via symmetry-aware actor-critic learning

Y Li, C Pan, H Xu, X Wang, Y Wu - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

Bimanual manipulation is important for building intelligent robots that unlock richer skills
than single arms. We consider a multi-object bimanual rearrangement task, where a …

被引用次数：13 相关文章所有 3 个版本

[PDF] mlr.press

A connection between one-step RL and critic regularization in reinforcement learning

B Eysenbach, M Geist, S Levine… - International …, 2023 - proceedings.mlr.press

As with any machine learning problem with limited data, effective offline RL algorithms
require careful regularization to avoid overfitting. One class of methods, known as one-step …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

Goplan: Goal-conditioned offline reinforcement learning by planning with learned models

M Wang, R Yang, X Chen, H Sun, M Fang… - arXiv preprint arXiv …, 2023 - arxiv.org

Offline Goal-Conditioned RL (GCRL) offers a feasible paradigm for learning general-
purpose policies from diverse and multi-task offline datasets. Despite notable recent …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Rerogcrl: Representation-based robustness in goal-conditioned reinforcement learning

X Yin, S Wu, J Liu, M Fang, X Zhao, X Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

While Goal-Conditioned Reinforcement Learning (GCRL) has gained attention, its
algorithmic robustness, particularly against adversarial perturbations, remains unexplored …

被引用次数：3 相关文章所有 3 个版本

[PDF] aaai.org

Goal-conditioned Q-learning as knowledge distillation

A Levine, S Feizi - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org

Many applications of reinforcement learning can be formalized as goal-conditioned
environments, where, in each episode, there is a" goal" that affects the rewards obtained …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

Imaginary hindsight experience replay: Curious model-based learning for sparse reward tasks

R McCarthy, Q Wang, SJ Redmond - arXiv preprint arXiv:2110.02414, 2021 - arxiv.org

Model-based reinforcement learning is a promising learning strategy for practical robotic
applications due to its improved data-efficiency versus model-free counterparts. However …

被引用次数：12 相关文章所有 2 个版本

[PDF] springer.com

Goal-conditioned offline reinforcement learning through state space partitioning

M Wang, Y Jin, G Montana - Machine Learning, 2024 - Springer

Offline reinforcement learning (RL) aims to create policies for sequential decision-making
using exclusively offline datasets. This presents a significant challenge, especially when …

被引用次数：5 相关文章所有 5 个版本

高级搜索

QQ 群