Towards robust offline reinforcement learning under diverse data corruption

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

On the hidden biases of policy mirror ascent in continuous action spaces

AS Bedi, S Chakraborty, A Parayil… - International …, 2022 - proceedings.mlr.press
We focus on parameterized policy search for reinforcement learning over continuous action
spaces. Typically, one assumes the score function associated with a policy is bounded …

Limit theorems for stochastic gradient descent with infinite variance

J Blanchet, A Mijatović, W Yang - arXiv preprint arXiv:2410.16340, 2024 - arxiv.org
Stochastic gradient descent is a classic algorithm that has gained great popularity especially
in the last decades as the most common approach for training models in machine learning …

On the sample complexity and metastability of heavy-tailed policy search in continuous control

AS Bedi, A Parayil, J Zhang, M Wang… - Journal of Machine …, 2024 - jmlr.org
Reinforcement learning is a framework for interactive decision-making with incentives
sequentially revealed across time without a system dynamics model. Due to its scaling to …

From Gradient Clipping to Normalization for Heavy Tailed SGD

F Hübler, I Fatkhullin, N He - arXiv preprint arXiv:2410.13849, 2024 - arxiv.org
Recent empirical evidence indicates that many machine learning applications involve heavy-
tailed gradient noise, which challenges the standard assumptions of bounded variance in …

Htron: Efficient outdoor navigation with sparse rewards via heavy tailed adaptive reinforce algorithm

K Weerakoon, S Chakraborty, N Karapetyan… - arXiv preprint arXiv …, 2022 - arxiv.org
We present a novel approach to improve the performance of deep reinforcement learning
(DRL) based outdoor robot navigation systems. Most, existing DRL methods are based on …

Heavy-tailed streaming statistical estimation

CP Tsai, A Prasad, S Balakrishnan… - International …, 2022 - proceedings.mlr.press
We consider the task of heavy-tailed statistical estimation given streaming $ p $-dimensional
samples. This could also be viewed as stochastic optimization under heavy-tailed …

Simple Policy Optimization

Z Xie, Q Zhang, R Xu - arXiv preprint arXiv:2401.16025, 2024 - arxiv.org
As one of the most important and influential algorithms in reinforcement learning, the
Proximal Policy Optimization (PPO) algorithm has demonstrated outstanding performance …

Uncertainty quantification for operators in online reinforcement learning

B Wang, J Wu, X Li, J Shen, Y Zhong - Knowledge-Based Systems, 2022 - Elsevier
In online reinforcement learning, operators predict the return by weighting the successors'
estimated value. However, due to the lack of uncertainty quantification, weights assigned by …

Towards Robust and Resilient Machine Learning

A Prasad - 2022 - kilthub.cmu.edu
Some common assumptions when building machine learning pipeline are:(1) the training
data is sufficiently “clean” and well-behaved, so that there are few or no outliers, or that the …