Game-theoretic robust reinforcement learning handles temporally-coupled perturbations

Y Liang, Y Sun, R Zheng, X Liu, B Eysenbach… - arXiv preprint arXiv …, 2023 - arxiv.org
Deploying reinforcement learning (RL) systems requires robustness to uncertainty and
model misspecification, yet prior robust RL methods typically only study noise introduced …

UVIS: Unsupervised Video Instance Segmentation

S Huang, S Suri, K Gupta… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video instance segmentation requires classifying segmenting and tracking every object
across video frames. Unlike existing approaches that rely on masks boxes or category labels …

COPlanner: Plan to Roll Out Conservatively but to Explore Optimistically for Model-Based RL

X Wang, R Zheng, Y Sun, R Jia, W Wongkamjan… - arXiv preprint arXiv …, 2023 - arxiv.org
Dyna-style model-based reinforcement learning contains two phases: model rollouts to
generate sample for policy learning and real environment exploration using current policy …

Premier-taco: Pretraining multitask representation via temporal action-driven contrastive loss

R Zheng, Y Liang, X Wang, S Ma, H Daumé III… - arXiv preprint arXiv …, 2024 - arxiv.org
We present Premier-TACO, a multitask feature representation learning approach designed
to improve few-shot policy learning efficiency in sequential decision-making tasks. Premier …

Offline-Boosted Actor-Critic: Adaptively Blending Optimal Historical Behaviors in Deep Off-Policy RL

Y Luo, T Ji, F Sun, J Zhang, H Xu, X Zhan - arXiv preprint arXiv …, 2024 - arxiv.org
Off-policy reinforcement learning (RL) has achieved notable success in tackling many
complex real-world tasks, by leveraging previously collected data for policy learning …

Slow and Steady Wins the Race: Maintaining Plasticity with Hare and Tortoise Networks

H Lee, H Cho, H Kim, D Kim, D Min, J Choo… - arXiv preprint arXiv …, 2024 - arxiv.org
This study investigates the loss of generalization ability in neural networks, revisiting warm-
starting experiments from Ash & Adams. Our empirical analysis reveals that common …

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn

H Tang, G Berseth - arXiv preprint arXiv:2409.04792, 2024 - arxiv.org
Deep neural networks provide Reinforcement Learning (RL) powerful function
approximators to address large-scale decision-making problems. However, these …

ACE: Off-Policy Actor-Critic with Causality-Aware Entropy Regularization

T Ji, Y Liang, Y Zeng, Y Luo, G Xu, J Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
The varying significance of distinct primitive behaviors during the policy learning process
has been overlooked by prior model-free RL algorithms. Leveraging this insight, we explore …

Higher Replay Ratio Empowers Sample-Efficient Multi-Agent Reinforcement Learning

L Xu, Z Liu, A Dockhorn, D Perez-Liebana… - arXiv preprint arXiv …, 2024 - arxiv.org
One of the notorious issues for Reinforcement Learning (RL) is poor sample efficiency.
Compared to single agent RL, the sample efficiency for Multi-Agent Reinforcement Learning …

Pretrained Visual Representations in Reinforcement Learning

E Williams, A Polydoros - arXiv preprint arXiv:2407.17238, 2024 - arxiv.org
Visual reinforcement learning (RL) has made significant progress in recent years, but the
choice of visual feature extractor remains a crucial design decision. This paper compares …