Q Wu,
SS Zhan,
Y Wang, Y Wang, CW Lin, C Lv… - arXiv preprint arXiv …, 2024 - arxiv.org
In environments with delayed observation, state augmentation by including actions within
the delay window is adopted to retrieve Markovian property to enable reinforcement learning …