Boosting Long-Delayed Reinforcement Learning with Auxiliary Short-Delayed Task

Q Wu, SS Zhan, Y Wang, CW Lin, C Lv, Q Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement learning is challenging in delayed scenarios, a common real-world situation
where observations and interactions occur with delays. State-of-the-art (SOTA) state …

Boosting Reinforcement Learning with Strongly Delayed Feedback Through Auxiliary Short Delays

Q Wu, SS Zhan, Y Wang, Y Wang, CW Lin, C Lv… - Forty-first International … - openreview.net
Reinforcement learning (RL) is challenging in the common case of delays between events
and their sensory perceptions. State-of-the-art (SOTA) state augmentation techniques either …

Variational Delayed Policy Optimization

Q Wu, SS Zhan, Y Wang, Y Wang, CW Lin, C Lv… - arXiv preprint arXiv …, 2024 - arxiv.org
In environments with delayed observation, state augmentation by including actions within
the delay window is adopted to retrieve Markovian property to enable reinforcement learning …

DEER: A Delay-Resilient Framework for Reinforcement Learning with Variable Delays

B Xia, Y Kong, B Yuan, Y Chang, Z Li, X Wang - 2023 - openreview.net
Classic reinforcement learning (RL) frequently struggles with tasks involving delays due to
the violation of the Markov property. Existing approaches usually tackle this issue with end …

Open the Black Box: Step-based Policy Updates for Temporally-Correlated Episodic Reinforcement Learning

G Li, H Zhou, D Roth, S Thilges, F Otto… - arXiv preprint arXiv …, 2024 - arxiv.org
Current advancements in reinforcement learning (RL) have predominantly focused on
learning step-based policies that generate actions for each perceived state. While these …

Revisiting state augmentation methods for reinforcement learning with stochastic delays

S Nath, M Baranwal, H Khadilkar - Proceedings of the 30th ACM …, 2021 - dl.acm.org
Several real-world scenarios, such as remote control and sensing, are comprised of action
and observation delays. The presence of delays degrades the performance of reinforcement …

Addressing Delays in Reinforcement Learning via Delayed Adversarial Imitation Learning

M Xie, B Xia, Y Yu, X Wang, Y Chang - International Conference on …, 2023 - Springer
Observation and action delays occur commonly in many real-world tasks which violate
Markov property and consequently degrade the performance of Reinforcement Learning …

Statistically Efficient Variance Reduction with Double Policy Estimation for Off-Policy Evaluation in Sequence-Modeled Reinforcement Learning

H Zhou, T Lan, V Aggarwal - arXiv preprint arXiv:2308.14897, 2023 - arxiv.org
Offline reinforcement learning aims to utilize datasets of previously gathered environment-
action interaction records to learn a policy without access to the real environment. Recent …

Time-aware q-networks: Resolving temporal irregularity for deep reinforcement learning

YJ Kim, M Chi - arXiv preprint arXiv:2105.02580, 2021 - arxiv.org
Deep Reinforcement Learning (DRL) has shown outstanding performance on inducing
effective action policies that maximize expected long-term return on many complex tasks …

Time-aware deep reinforcement learning with multi-temporal abstraction

YJ Kim, M Chi - Applied Intelligence, 2023 - Springer
Deep reinforcement learning (DRL) is advantageous, but it rarely performs well when tested
on real-world decision-making tasks, particularly those involving irregular time series with …