Interpretable reward redistribution in reinforcement learning: a causal approach

Y Zhang, Y Du, B Huang, Z Wang… - Advances in …, 2024 - proceedings.neurips.cc
A major challenge in reinforcement learning is to determine which state-action pairs are
responsible for future rewards that are delayed. Reward redistribution serves as a solution to …

A survey on causal reinforcement learning

Y Zeng, R Cai, F Sun, L Huang, Z Hao - arXiv preprint arXiv:2302.05209, 2023 - arxiv.org
While Reinforcement Learning (RL) achieves tremendous success in sequential decision-
making problems of many domains, it still faces key challenges of data inefficiency and the …

Exploration-driven representation learning in reinforcement learning

A Erraqabi, H Zhao, MC Machado… - ICML 2021 Workshop …, 2021 - openreview.net
Learning reward-agnostic representations is an emerging paradigm in reinforcement
learning. These representations can be leveraged for several purposes ranging from reward …

Distributional multivariate policy evaluation and exploration with the bellman gan

D Freirich, T Shimkin, R Meir… - … Conference on Machine …, 2019 - proceedings.mlr.press
The recently proposed distributional approach to reinforcement learning (DiRL) is centered
on learning the distribution of the reward-to-go, often referred to as the value distribution. In …

Diffusion models for reinforcement learning: A survey

Z Zhu, H Zhao, H He, Y Zhong, S Zhang, Y Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Diffusion models have emerged as a prominent class of generative models, surpassing
previous methods regarding sample quality and training stability. Recent works have shown …

Counterfactual credit assignment in model-free reinforcement learning

T Mesnard, T Weber, F Viola, S Thakoor… - arXiv preprint arXiv …, 2020 - arxiv.org
Credit assignment in reinforcement learning is the problem of measuring an action's
influence on future rewards. In particular, this requires separating skill from luck, ie …

Learning long-term reward redistribution via randomized return decomposition

Z Ren, R Guo, Y Zhou, J Peng - arXiv preprint arXiv:2111.13485, 2021 - arxiv.org
Many practical applications of reinforcement learning require agents to learn from sparse
and delayed rewards. It challenges the ability of agents to attribute their actions to future …

Distributional reward decomposition for reinforcement learning

Z Lin, L Zhao, D Yang, T Qin… - Advances in neural …, 2019 - proceedings.neurips.cc
Many reinforcement learning (RL) tasks have specific properties that can be leveraged to
modify existing RL algorithms to adapt to those tasks and further improve performance, and …

Causal reinforcement learning: A survey

Z Deng, J Jiang, G Long, C Zhang - arXiv preprint arXiv:2307.01452, 2023 - arxiv.org
Reinforcement learning is an essential paradigm for solving sequential decision problems
under uncertainty. Despite many remarkable achievements in recent decades, applying …

Constructing a good behavior basis for transfer using generalized policy updates

S Alver, D Precup - arXiv preprint arXiv:2112.15025, 2021 - arxiv.org
We study the problem of learning a good set of policies, so that when combined together,
they can solve a wide variety of unseen reinforcement learning tasks with no or very little …