Momentum-based policy gradient methods

F Huang, S Gao, J Pei, H Huang - … conference on machine …, 2020 - proceedings.mlr.press
In the paper, we propose a class of efficient momentum-based policy gradient methods for
the model-free reinforcement learning, which use adaptive learning rates and do not require …

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer
In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

Virel: A variational inference framework for reinforcement learning

M Fellows, A Mahajan, TGJ Rudner… - Advances in neural …, 2019 - proceedings.neurips.cc
Applying probabilistic models to reinforcement learning (RL) enables the uses of powerful
optimisation tools such as variational inference in RL. However, existing inference …

Reinforcement learning for portfolio management

A Filos - arXiv preprint arXiv:1909.09571, 2019 - arxiv.org
In this thesis, we develop a comprehensive account of the expressive power, modelling
efficiency, and performance advantages of so-called trading agents (ie, Deep Soft Recurrent …

Bayesian bellman operators

M Fellows, K Hartikainen… - Advances in Neural …, 2021 - proceedings.neurips.cc
We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas
existing approaches infer a posterior over the transition distribution or Q-function, we …

Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

H Tan, S Liu, K Ma, C Ying, X Zhang, H Su… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning is able to obtain generalized low-level robot policies on diverse
robotics datasets in embodied learning scenarios, and Transformer has been widely used to …

Low-rank representation of reinforcement learning policies

B Mazoure, T Doan, T Li, V Makarenkov… - Journal of Artificial …, 2022 - jair.org
We propose a general framework for policy representation for reinforcement learning tasks.
This framework involves finding a low-dimensional embedding of the policy on a …

Taylor TD-learning

M Garibbo, M Robeyns… - Advances in Neural …, 2024 - proceedings.neurips.cc
Many reinforcement learning approaches rely on temporal-difference (TD) learning to learn
a critic. However, TD-learning updates can be high variance. Here, we introduce a model …

All-action policy gradient methods: A numerical integration approach

B Petit, L Amdahl-Culleton, Y Liu, J Smith… - arXiv preprint arXiv …, 2019 - arxiv.org
While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original
policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this …

Safe policy optimization

M Papini - 2020 - politesi.polimi.it
Policy Optimization (PO) is a family of reinforcement learning algorithms that is particularly
suited to real-world control tasks due to its ability of managing high-dimensional decision …