Fourier policy gradients

F Huang, S Gao, J Pei, H Huang - … conference on machine …, 2020 - proceedings.mlr.press

In the paper, we propose a class of efficient momentum-based policy gradient methods for
the model-free reinforcement learning, which use adaptive learning rates and do not require …

被引用次数：53 相关文章所有 6 个版本

[PDF] springer.com

Importance sampling in reinforcement learning with an estimated behavior policy

JP Hanna, S Niekum, P Stone - Machine Learning, 2021 - Springer

In reinforcement learning, importance sampling is a widely used method for evaluating an
expectation under the distribution of data of one policy when the data has in fact been …

被引用次数：37 相关文章所有 13 个版本

[PDF] neurips.cc

Virel: A variational inference framework for reinforcement learning

M Fellows, A Mahajan, TGJ Rudner… - Advances in neural …, 2019 - proceedings.neurips.cc

Applying probabilistic models to reinforcement learning (RL) enables the uses of powerful
optimisation tools such as variational inference in RL. However, existing inference …

被引用次数：63 相关文章所有 15 个版本

[PDF] arxiv.org

Reinforcement learning for portfolio management

A Filos - arXiv preprint arXiv:1909.09571, 2019 - arxiv.org

In this thesis, we develop a comprehensive account of the expressive power, modelling
efficiency, and performance advantages of so-called trading agents (ie, Deep Soft Recurrent …

被引用次数：46 相关文章所有 4 个版本

[PDF] neurips.cc

Bayesian bellman operators

M Fellows, K Hartikainen… - Advances in Neural …, 2021 - proceedings.neurips.cc

We introduce a novel perspective on Bayesian reinforcement learning (RL); whereas
existing approaches infer a posterior over the transition distribution or Q-function, we …

被引用次数：21 相关文章所有 7 个版本

[PDF] arxiv.org

Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

H Tan, S Liu, K Ma, C Ying, X Zhang, H Su… - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement learning is able to obtain generalized low-level robot policies on diverse
robotics datasets in embodied learning scenarios, and Transformer has been widely used to …

被引用次数：1 相关文章所有 3 个版本

[PDF] jair.org Full View

Low-rank representation of reinforcement learning policies

B Mazoure, T Doan, T Li, V Makarenkov… - Journal of Artificial …, 2022 - jair.org

We propose a general framework for policy representation for reinforcement learning tasks.
This framework involves finding a low-dimensional embedding of the policy on a …

被引用次数：1 相关文章所有 8 个版本

[PDF] neurips.cc

Taylor TD-learning

M Garibbo, M Robeyns… - Advances in Neural …, 2024 - proceedings.neurips.cc

Many reinforcement learning approaches rely on temporal-difference (TD) learning to learn
a critic. However, TD-learning updates can be high variance. Here, we introduce a model …

被引用次数：3 相关文章所有 7 个版本

[PDF] arxiv.org

All-action policy gradient methods: A numerical integration approach

B Petit, L Amdahl-Culleton, Y Liu, J Smith… - arXiv preprint arXiv …, 2019 - arxiv.org

While often stated as an instance of the likelihood ratio trick [Rubinstein, 1989], the original
policy gradient theorem [Sutton, 1999] involves an integral over the action space. When this …

被引用次数：9 相关文章所有 3 个版本

[PDF] polimi.it

Safe policy optimization

M Papini - 2020 - politesi.polimi.it

Policy Optimization (PO) is a family of reinforcement learning algorithms that is particularly
suited to real-world control tasks due to its ability of managing high-dimensional decision …

被引用次数：4 相关文章所有 2 个版本

高级搜索

QQ 群

Momentum-based policy gradient methods

Importance sampling in reinforcement learning with an estimated behavior policy

Virel: A variational inference framework for reinforcement learning

Reinforcement learning for portfolio management

Bayesian bellman operators

Fourier Controller Networks for Real-Time Decision-Making in Embodied Learning

Low-rank representation of reinforcement learning policies

Taylor TD-learning

All-action policy gradient methods: A numerical integration approach

Safe policy optimization

引用