Rewarded soups: towards pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

A Rame, G Couairon, C Dancette… - Advances in …, 2024 - proceedings.neurips.cc
Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned
on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further …

: Temporal Latent Action-Driven Contrastive Loss for Visual Reinforcement Learning

R Zheng, X Wang, Y Sun, S Ma… - Advances in …, 2024 - proceedings.neurips.cc
Despite recent progress in reinforcement learning (RL) from raw pixel data, sample
inefficiency continues to present a substantial obstacle. Prior works have attempted to …

Enhancing visual reinforcement learning with State–Action Representation

M Yan, J Lyu, X Li - Knowledge-Based Systems, 2024 - Elsevier
Despite the remarkable progress made in visual reinforcement learning (RL) in recent years,
sample inefficiency remains a major challenge. Many existing approaches attempt to …

Attention-based policy distillation for uav simultaneous target tracking and obstacle avoidance

L Xu, T Wang, J Wang, J Liu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Nowadays, deep reinforcement learning (DRL) has made remarkable achievements in the
research of unmanned aerial vehicle (UAV) applications. However, much of the current …

基于Transformer 的状态− 动作− 奖赏预测表征学习

刘民颂, 朱圆恒, 赵冬斌 - 自动化学报, 2025 - aas.net.cn
为了提升具有高维动作空间的复杂连续控制任务的性能和样本效率, 提出一种基于Transformer
的状态− 动作− 奖赏预测表征学习框架(Transformer-based state-action-reward prediction …

DCT: Dual Channel Training of Action Embeddings for Reinforcement Learning with Large Discrete Action Spaces

P Pathakota, H Meisheri, H Khadilkar - arXiv preprint arXiv:2306.15913, 2023 - arxiv.org
The ability to learn robust policies while generalizing over large discrete action spaces is an
open challenge for intelligent systems, especially in noisy environments that face the curse …

Self-Evolution Policy Learning: Leveraging Basic Tasks to Complex Ones

Q Chen, W Xiao, Y Li, X Luo - 2024 5th International Seminar …, 2024 - ieeexplore.ieee.org
Machine autonomy in automatic control often relies on reinforcement learning (RL) for robot
control. However, RL agents struggle with complex long-sequence tasks due to catastrophic …