Simple emergent action representations from multi-task policy training

Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2024 - proceedings.neurips.cc

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

被引用次数：94 相关文章所有 7 个版本

[PDF] neurips.cc

: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

R Zheng, X Wang, Y Sun, S Ma… - Advances in …, 2024 - proceedings.neurips.cc

Despite recent progress in reinforcement learning (RL) from raw pixel data, sample
inefficiency continues to present a substantial obstacle. Prior works have attempted to …

被引用次数：39 相关文章所有 6 个版本

Enhancing visual reinforcement learning with State–Action Representation

M Yan, J Lyu, X Li - Knowledge-Based Systems, 2024 - Elsevier

Despite the remarkable progress made in visual reinforcement learning (RL) in recent years,
sample inefficiency remains a major challenge. Many existing approaches attempt to …

被引用次数：2 相关文章所有 3 个版本

Attention-based policy distillation for uav simultaneous target tracking and obstacle avoidance

L Xu, T Wang, J Wang, J Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Nowadays, deep reinforcement learning (DRL) has made remarkable achievements in the
research of unmanned aerial vehicle (UAV) applications. However, much of the current …

被引用次数：4 相关文章

高级搜索

QQ 群