B Qi, P Li, F Li, J Gao, K Zhang, B Zhou - arXiv preprint arXiv:2406.05534, 2024 - arxiv.org
Direct Preference Optimization (DPO) improves the alignment of large language models (LLMs) with human values by training directly on human preference datasets, eliminating the …
Continual Imitation Learning (CiL) involves extracting and accumulating task knowledge from demonstrations across multiple stages and tasks to achieve a multi-task policy. With …
C Pan, X Yang, H Wang, W Wei, T Li - arXiv preprint arXiv:2401.15098, 2024 - arxiv.org
Continual reinforcement learning (CRL) empowers RL agents with the ability to learn from a sequence of tasks, preserving previous knowledge and leveraging it to facilitate future …
Domain Randomization (DR) is commonly used for sim2real transfer of reinforcement learning (RL) policies in robotics. Most DR approaches require a simulator with a fixed set of …
S Auddy, S Bergner, J Piater - arXiv preprint arXiv:2401.00524, 2023 - arxiv.org
In continual learning from demonstration (CLfD), a robot learns a sequence of real-world motion skills continually from human demonstrations. Recently, hypernetworks have been …
S Zhang, J Hu, X Du, Z Yang, Y Yu… - 2024 International Joint …, 2024 - ieeexplore.ieee.org
Exploring the state space efficiently is a crucial problem in reinforcement learning as it holds significant importance for learning optimal policies. One effective approach involves learning …
C Pan, J Liu, Y Li, LB Xiong, F Min, W Wei, T Li, X Yang - openreview.net
Continual Reinforcement Learning (CRL) is a powerful tool that enables agents to learn a sequence of tasks, accumulating knowledge learned in the past and using it for …
S Auddy, S Bergner¹, J Piater - … Robotics Forum 2024: 15th ERF, Volume … - books.google.com
In continual learning from demonstration (CLfD), a robot learns a sequence of real-world motion skills continually from human demonstrations. Recently, hypernetworks have been …