Learning and planning for time-varying mdps using maximum likelihood estimation

Y Chandak, G Theocharous… - International …, 2020 - proceedings.mlr.press

Most reinforcement learning methods are based upon the key assumption that the transition
dynamics and reward functions are fixed, that is, the underlying Markov decision process is …

被引用次数：77 相关文章所有 11 个版本

[PDF] mlr.press

Robust policy gradient against strong data corruption

X Zhang, Y Chen, X Zhu, W Sun - … Conference on Machine …, 2021 - proceedings.mlr.press

We study the problem of robust reinforcement learning under adversarial corruption on both
rewards and transitions. Our attack model assumes an\textit {adaptive} adversary who can …

被引用次数：40 相关文章所有 8 个版本

[PDF] ieee.org

AdaPool: A diurnal-adaptive fleet management framework using model-free deep reinforcement learning and change point detection

M Haliem, V Aggarwal… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

This paper introduces an adaptive model-free deep reinforcement approach that can
recognize and adapt to the diurnal patterns in the ride-sharing environment with car-pooling …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

Weathering ongoing uncertainty: Learning and planning in a time-varying partially observable environment

G Puthumanaillam, X Liu, N Mehr… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org

Optimal decision-making presents a significant challenge for autonomous systems operating
in uncertain, stochastic and time-varying environments. Environmental variability over time …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

The complexity of non-stationary reinforcement learning

C Papadimitriou, B Peng - arXiv preprint arXiv:2307.06877, 2023 - arxiv.org

The problem of continual learning in the domain of reinforcement learning, often called non-
stationary reinforcement learning, has been identified as an important challenge to the …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

A Moral Imperative: The Need for Continual Superalignment of Large Language Models

G Puthumanaillam, M Vora, P Thangeda… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper examines the challenges associated with achieving life-long superalignment in
AI systems, particularly large language models (LLMs). Superalignment is a theoretical …

被引用次数：9 相关文章所有 2 个版本

[PDF] mlr.press

The complexity of non-stationary reinforcement learning

B Peng, C Papadimitriou - International Conference on …, 2024 - proceedings.mlr.press

The problem of continual learning in the domain of reinforcement learning, often called non-
stationary reinforcement learning, has been identified as an important challenge to the …

被引用次数：2 相关文章

[PDF] springer.com

ACRE: Actor-Critic with Reward-Preserving Exploration

AC Kapoutsis, DI Koutras, CD Korkas… - Neural Computing and …, 2023 - Springer

While reinforcement learning (RL) algorithms have generated impressive strategies for a
wide range of tasks, the performance improvements in continuous-domain, real-world …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Client selection for federated policy optimization with environment heterogeneity

Z Xie, SH Song - arXiv preprint arXiv:2305.10978, 2023 - arxiv.org

The development of Policy Iteration (PI) has inspired many recent algorithms for
Reinforcement Learning (RL), including several policy gradient methods that gained both …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Model-Free Learning and Optimal Policy Design in Multi-Agent MDPs Under Probabilistic Agent Dropout

C Fiscko, S Kar, B Sinopoli - IEEE Transactions on Control of …, 2024 - ieeexplore.ieee.org

This work studies a multi-agent Markov decision process (MDP) that can undergo agent
dropout and the computation of policies for the post-dropout system based on control and …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群