CPPO: Continual Learning for Reinforcement Learning with Human Feedback

H Zhang, Y Lei, L Gui, M Yang, Y He… - The Twelfth …, 2024 - openreview.net
The approach of Reinforcement Learning from Human Feedback (RLHF) is widely used for
enhancing pre-trained Language Models (LM), enabling them to better align with human …

Online DPO: Online Direct Preference Optimization with Fast-Slow Chasing

B Qi, P Li, F Li, J Gao, K Zhang, B Zhou - arXiv preprint arXiv:2406.05534, 2024 - arxiv.org
Direct Preference Optimization (DPO) improves the alignment of large language models
(LLMs) with human values by training directly on human preference datasets, eliminating the …

Incremental Learning of Retrievable Skills For Efficient Continual Task Adaptation

D Lee, M Yoo, WK Kim, W Choi, H Woo - arXiv preprint arXiv:2410.22658, 2024 - arxiv.org
Continual Imitation Learning (CiL) involves extracting and accumulating task knowledge
from demonstrations across multiple stages and tasks to achieve a multi-task policy. With …

Hi-Core: Hierarchical Knowledge Transfer for Continual Reinforcement Learning

C Pan, X Yang, H Wang, W Wei, T Li - arXiv preprint arXiv:2401.15098, 2024 - arxiv.org
Continual reinforcement learning (CRL) empowers RL agents with the ability to learn from a
sequence of tasks, preserving previous knowledge and leveraging it to facilitate future …

Continual Domain Randomization

J Josifovski, S Auddy, M Malmir, J Piater… - arXiv preprint arXiv …, 2024 - arxiv.org
Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement
learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of …

Effect of Optimizer, Initializer, and Architecture of Hypernetworks on Continual Learning from Demonstration

S Auddy, S Bergner, J Piater - arXiv preprint arXiv:2401.00524, 2023 - arxiv.org
In continual learning from demonstration (CLfD), a robot learns a sequence of real-world
motion skills continually from human demonstrations. Recently, hypernetworks have been …

Effective State Space Exploration with Phase State Graph Generation and Goal-based Path Planning

S Zhang, J Hu, X Du, Z Yang, Y Yu… - 2024 International Joint …, 2024 - ieeexplore.ieee.org
Exploring the state space efficiently is a crucial problem in reinforcement learning as it holds
significant importance for learning optimal policies. One effective approach involves learning …

ARC-RL: Self-Evolution Continual Reinforcement Learning via Action Representation Space

C Pan, J Liu, Y Li, LB Xiong, F Min, W Wei, T Li, X Yang - openreview.net
Continual Reinforcement Learning (CRL) is a powerful tool that enables agents to learn a
sequence of tasks, accumulating knowledge learned in the past and using it for …

Effect of Optimizer, Initializer, and Architecture of Hypernetworks

S Auddy, S Bergner¹, J Piater - … Robotics Forum 2024: 15th ERF, Volume … - books.google.com
In continual learning from demonstration (CLfD), a robot learns a sequence of real-world
motion skills continually from human demonstrations. Recently, hypernetworks have been …