Temporal Logic Specification-Conditioned Decision Transformer for Offline Safe Reinforcement Learning

Z Guo, W Zhou, W Li - arXiv preprint arXiv:2402.17217, 2024 - arxiv.org
Offline safe reinforcement learning (RL) aims to train a constraint satisfaction policy from a
fixed dataset. Current state-of-the-art approaches are based on supervised learning with a …

UAV Control Method Combining Reptile Meta-Reinforcement Learning and Generative Adversarial Imitation Learning

S Jiang, Y Ge, X Yang, W Yang, H Cui - Future Internet, 2024 - mdpi.com
Reinforcement learning (RL) is pivotal in empowering Unmanned Aerial Vehicles (UAVs) to
navigate and make decisions efficiently and intelligently within complex and dynamic …

The Application of Residual Connection-Based State Normalization Method in GAIL

Y Ge, T Huang, X Wang, G Zheng, X Yang - Mathematics, 2024 - mdpi.com
In the domain of reinforcement learning (RL), deriving efficacious state representations and
maintaining algorithmic stability are crucial for optimal agent performance. However, the …

Study on Gamma selection in the optimal operation of secondary water supply system based on deep Q-learning network

W Geng, J Yan, S Xie, D Zhang - Water Supply, 2023 - iwaponline.com
When the requirements for water pressure and quantity of drinking water for residents and
industrial buildings exceed the capacity of the urban water distribution system, a secondary …

A Dynamic and Task-Independent Reward Shaping Approach for Discrete Partially Observable Markov Decision Processes

S Nahali, H Ayadi, JX Huang, E Pakizeh… - Pacific-Asia Conference …, 2023 - Springer
Agents often need a long time to explore state-action space in order to learn how to act
expectedly in Partially Observable Markov Decision Processes (POMDPs). With the reward …

Dynamic Goal Tracking for Differential Drive Robot Using Deep Reinforcement Learning

M Shahid, SN Khan, KF Iqbal, S Ali, Y Ayaz - Neural Processing Letters, 2023 - Springer
To ensure the steady navigation for robot stable controls are one of the basic requirements.
Control values selection is highly environment dependent. To ensure reusability of control …

PAGAR: Taming Reward Misalignment in Inverse Reinforcement Learning-Based Imitation Learning with Protagonist Antagonist Guided Adversarial Reward

W Zhou, W Li - 2023 - openreview.net
Many imitation learning (IL) algorithms employ inverse reinforcement learning (IRL) to infer
the underlying reward function that an expert is implicitly optimizing for, based on their …