Human-in-the-loop reinforcement learning in continuous-action space

B Luo, Z Wu, F Zhou, BC Wang - IEEE Transactions on Neural …, 2023 - ieeexplore.ieee.org
Human-in-the-loop for reinforcement learning (RL) is usually employed to overcome the
challenge of sample inefficiency, in which the human expert provides advice for the agent …

Joint optimization of maintenance and quality inspection for manufacturing networks based on deep reinforcement learning

Z Ye, Z Cai, H Yang, S Si, F Zhou - Reliability engineering & system safety, 2023 - Elsevier
Most existing studies on joint optimization of manufacturing systems (MS) focus on small-
scale systems with simple structures, such as the single-machine, simple serial, or parallel …

Learn Zero-Constraint-Violation Safe Policy in Model-Free Constrained Reinforcement Learning

H Ma, C Liu, SE Li, S Zheng, W Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
We focus on learning the zero-constraint-violation safe policy in model-free reinforcement
learning (RL). Existing model-free RL studies mostly use the posterior penalty to penalize …

Shielded Planning Guided Data-Efficient and Safe Reinforcement Learning

H Wang, J Qin, Z Kan - IEEE Transactions on Neural Networks …, 2024 - ieeexplore.ieee.org
Safe reinforcement learning (RL) has shown great potential for building safe general-
purpose robotic systems. While many existing works have focused on post-training policy …

MORTAR: A Model-based Runtime Action Repair Framework for AI-enabled Cyber-Physical Systems

R Wang, Z Zhou, J Song, X Xie, X Xie, L Ma - arXiv preprint arXiv …, 2024 - arxiv.org
Cyber-Physical Systems (CPSs) are increasingly prevalent across various industrial and
daily-life domains, with applications ranging from robotic operations to autonomous driving …

Optimal dispatch of unbalanced distribution networks with phase-changing soft open points based on safe reinforcement learning

L Hong, L Qizhe, Z Qiang, X Zhengyang… - … Energy, Grids and …, 2024 - Elsevier
Distributed energy resources and uneven load allocation cause the three-phase unbalance
in distribution networks, which may harm the health of power equipment and increase the …

Information fusion for online estimation of the behavior of traffic participants using belief function theory

T Benciolini, X Zhang, D Wollherr… - Frontiers in Future …, 2023 - frontiersin.org
Motion planning algorithms for automated vehicles need to assess the intended behavior of
other Traffic Participants (TPs), in order to predict the likely future trajectory of TPs and plan …

Opinion-Guided Reinforcement Learning

K Dagenais, I David - arXiv preprint arXiv:2405.17287, 2024 - arxiv.org
Human guidance is often desired in reinforcement learning to improve the performance of
the learning agent. However, human insights are often mere opinions and educated …

GenSafe: A Generalizable Safety Enhancer for Safe Reinforcement Learning Algorithms Based on Reduced Order Markov Decision Process Model

Z Zhou, X Xie, J Song, Z Shu, L Ma - arXiv preprint arXiv:2406.03912, 2024 - arxiv.org
Although deep reinforcement learning has demonstrated impressive achievements in
controlling various autonomous systems, eg, autonomous vehicles or humanoid robots, its …

A train trajectory optimization method based on the safety reinforcement learning with a relaxed dynamic reward

L Cheng, J Cao, X Yang, W Wang, Z Zhou - Discover Applied Sciences, 2024 - Springer
Train trajectory optimization (TTO) is an effective way to address energy consumption in rail
transit. Reinforcement learning (RL), an excellent optimization method, has been used to …