On proximal policy optimization’s heavy-tailed gradients

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

被引用次数：13 相关文章所有 3 个版本

[PDF] mlr.press

On the hidden biases of policy mirror ascent in continuous action spaces

AS Bedi, S Chakraborty, A Parayil… - International …, 2022 - proceedings.mlr.press

We focus on parameterized policy search for reinforcement learning over continuous action
spaces. Typically, one assumes the score function associated with a policy is bounded …

被引用次数：20 相关文章所有 6 个版本

[PDF] arxiv.org

Limit theorems for stochastic gradient descent with infinite variance

J Blanchet, A Mijatović, W Yang - arXiv preprint arXiv:2410.16340, 2024 - arxiv.org

Stochastic gradient descent is a classic algorithm that has gained great popularity especially
in the last decades as the most common approach for training models in machine learning …

被引用次数：2 相关文章所有 2 个版本

[PDF] jmlr.org

On the sample complexity and metastability of heavy-tailed policy search in continuous control

AS Bedi, A Parayil, J Zhang, M Wang… - Journal of Machine …, 2024 - jmlr.org

Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …

被引用次数：13 相关文章所有 3 个版本

[PDF] arxiv.org

From Gradient Clipping to Normalization for Heavy Tailed SGD

F Hübler, I Fatkhullin, N He - arXiv preprint arXiv:2410.13849, 2024 - arxiv.org

Recent empirical evidence indicates that many machine learning applications involve heavy-
tailed gradient noise, which challenges the standard assumptions of bounded variance in …

被引用次数：1 相关文章所有 4 个版本

[PDF] arxiv.org

Htron: Efficient outdoor navigation with sparse rewards via heavy tailed adaptive reinforce algorithm

K Weerakoon, S Chakraborty, N Karapetyan… - arXiv preprint arXiv …, 2022 - arxiv.org

We present a novel approach to improve the performance of deep reinforcement learning
(DRL) based outdoor robot navigation systems. Most, existing DRL methods are based on …

被引用次数：11 相关文章所有 6 个版本

[PDF] mlr.press

Heavy-tailed streaming statistical estimation

CP Tsai, A Prasad, S Balakrishnan… - International …, 2022 - proceedings.mlr.press

We consider the task of heavy-tailed statistical estimation given streaming $ p $-dimensional
samples. This could also be viewed as stochastic optimization under heavy-tailed …

被引用次数：11 相关文章所有 5 个版本

[PDF] arxiv.org

Simple Policy Optimization

Z Xie, Q Zhang, R Xu - arXiv preprint arXiv:2401.16025, 2024 - arxiv.org

As one of the most important and influential algorithms in reinforcement learning, the
Proximal Policy Optimization (PPO) algorithm has demonstrated outstanding performance …

Uncertainty quantification for operators in online reinforcement learning

B Wang, J Wu, X Li, J Shen, Y Zhong - Knowledge-Based Systems, 2022 - Elsevier

In online reinforcement learning, operators predict the return by weighting the successors'
estimated value. However, due to the lack of uncertainty quantification, weights assigned by …

被引用次数：2 相关文章所有 3 个版本

[PDF] cmu.edu

Towards Robust and Resilient Machine Learning

A Prasad - 2022 - kilthub.cmu.edu

Some common assumptions when building machine learning pipeline are:(1) the training
data is sufficiently “clean” and well-behaved, so that there are few or no outliers, or that the …

被引用次数：3 相关文章

高级搜索

QQ 群