Prioritizing useful experience replay for heuristic dynamic programming-based learning systems

Z Ni, N Malla, X Zhong - IEEE Transactions on Cybernetics, 2018 - ieeexplore.ieee.org
The adaptive dynamic programming controller usually needs a long training period because
the data usage efficiency is relatively low by discarding the samples once used. Prioritized …

Event-triggered ADP control of a class of non-affine continuous-time nonlinear systems using output information

Y Yang, C Xu, D Yue, X Zhong, X Si, J Tan - Neurocomputing, 2020 - Elsevier
An event-triggered adaptive dynamic programming (ADP) approach is proposed for a class
of non-affine continuous-time nonlinear systems with unknown internal states. A neural …

An Improved Trust-Region Method for Off-Policy Deep Reinforcement Learning

H Li, X Zhong, H He - 2023 International Joint Conference on …, 2023 - ieeexplore.ieee.org
Reinforcement learning (RL) is a powerful tool for training agents to interact with complex
environments. In particular, trust-region methods are widely used for policy optimization in …

Improved duelling deep Q-networks based path planning for intelligent agents

Y Lin, J Wen - International Journal of Vehicle Design, 2023 - inderscienceonline.com
The natural deep Q-network (DQN) usually requires a long training time because the data
usage efficiency is relatively low due to uniform sampling. Importance sampling (IS) can …