Reinforcement-learning agents with different temperature parameters explain the variety of...

R Zhang, Q Lv, J Li, J Bao, T Liu, S Liu - Robotics and Computer-Integrated …, 2022 - Elsevier

The assembly process of high precision products involves a variety of delicate operations
that are time-consuming and energy-intensive. Neither the human operators nor the robots …

被引用次数：134 相关文章所有 2 个版本

A graph-based reinforcement learning-enabled approach for adaptive human-robot collaborative assembly operations

R Zhang, J Lv, J Li, J Bao, P Zheng, T Peng - Journal of Manufacturing …, 2022 - Elsevier

In today's prevailing manufacturing paradigm of mass personalization, neither human
operators nor robots alone can perform all assembly tasks efficiently. To overcome it, human …

被引用次数：38 相关文章所有 2 个版本

Self-teaching adaptive dynamic programming for Gomoku

D Zhao, Z Zhang, Y Dai - Neurocomputing, 2012 - Elsevier

In this paper adaptive dynamic programming (ADP) is applied to learn to play Gomoku. The
critic network is used to evaluate board situations. The basic idea is to penalize the last …

被引用次数：40 相关文章所有 3 个版本

[PDF] researchgate.net

ADP with MCTS algorithm for Gomoku

Z Tang, D Zhao, K Shao, LV Le - 2016 IEEE Symposium Series …, 2016 - ieeexplore.ieee.org

Inspired by the core idea of AlphaGo, we combine a neural network, which is trained by
Adaptive Dynamic Programming (ADP), with Monte Carlo Tree Search (MCTS) algorithm for …

被引用次数：19 相关文章所有 2 个版本

[PDF] core.ac.uk

[PDF][PDF] Learning from noisy and delayed rewards the value of reinforcement learning to defense modeling and simulation

JK Alt - 2012 - core.ac.uk

Modeling and simulation of military operations requires human behavior models capable of
learning from experience in complex environments where feedback on action quality is …

被引用次数：6 相关文章

[PDF] up.pt

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群