Optimizing for the future in non-stationary mdps

Y Chandak, G Theocharous… - International …, 2020 - proceedings.mlr.press
Most reinforcement learning methods are based upon the key assumption that the transition
dynamics and reward functions are fixed, that is, the underlying Markov decision process is …

Robust policy gradient against strong data corruption

X Zhang, Y Chen, X Zhu, W Sun - … Conference on Machine …, 2021 - proceedings.mlr.press
We study the problem of robust reinforcement learning under adversarial corruption on both
rewards and transitions. Our attack model assumes an\textit {adaptive} adversary who can …

AdaPool: A diurnal-adaptive fleet management framework using model-free deep reinforcement learning and change point detection

M Haliem, V Aggarwal… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
This paper introduces an adaptive model-free deep reinforcement approach that can
recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling …

Weathering ongoing uncertainty: Learning and planning in a time-varying partially observable environment

G Puthumanaillam, X Liu, N Mehr… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Optimal decision-making presents a significant challenge for autonomous systems operating
in uncertain, stochastic and time-varying environments. Environmental variability over time …

The complexity of non-stationary reinforcement learning

C Papadimitriou, B Peng - arXiv preprint arXiv:2307.06877, 2023 - arxiv.org
The problem of continual learning in the domain of reinforcement learning, often called non-
stationary reinforcement learning, has been identified as an important challenge to the …

A Moral Imperative: The Need for Continual Superalignment of Large Language Models

G Puthumanaillam, M Vora, P Thangeda… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper examines the challenges associated with achieving life-long superalignment in
AI systems, particularly large language models (LLMs). Superalignment is a theoretical …

The complexity of non-stationary reinforcement learning

B Peng, C Papadimitriou - International Conference on …, 2024 - proceedings.mlr.press
The problem of continual learning in the domain of reinforcement learning, often called non-
stationary reinforcement learning, has been identified as an important challenge to the …

ACRE: Actor-Critic with Reward-Preserving Exploration

AC Kapoutsis, DI Koutras, CD Korkas… - Neural Computing and …, 2023 - Springer
While reinforcement learning (RL) algorithms have generated impressive strategies for a
wide range of tasks, the performance improvements in continuous-domain, real-world …

Client selection for federated policy optimization with environment heterogeneity

Z Xie, SH Song - arXiv preprint arXiv:2305.10978, 2023 - arxiv.org
The development of Policy Iteration (PI) has inspired many recent algorithms for
Reinforcement Learning (RL), including several policy gradient methods that gained both …

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

C Fiscko, S Kar, B Sinopoli - IEEE Transactions on Control of …, 2024 - ieeexplore.ieee.org
This work studies a multi-agent Markov decision process (MDP) that can undergo agent
dropout and the computation of policies for the post-dropout system based on control and …