Learning optimal parameterized policy for high level strategies in a game setting

R Prakash, M Vohra, L Behera - 2019 28th IEEE International …, 2019 - ieeexplore.ieee.org
2019 28th IEEE International Conference on Robot and Human …, 2019ieeexplore.ieee.org
Complex and interactive robot manipulation skills such as playing a game of table tennis
against a human opponent is a multifaceted challenge and a novel problem. Accurate
dynamic trajectory generation in such dynamic situations and an appropriate controller in
order to respond to the incoming table tennis ball from the opponent is only a prerequisite to
win the game. Decision making is a major part of an intelligent robot and a policy is needed
to choose and execute the action which receives highest reward. In this paper, we address …
Complex and interactive robot manipulation skills such as playing a game of table tennis against a human opponent is a multifaceted challenge and a novel problem. Accurate dynamic trajectory generation in such dynamic situations and an appropriate controller in order to respond to the incoming table tennis ball from the opponent is only a prerequisite to win the game. Decision making is a major part of an intelligent robot and a policy is needed to choose and execute the action which receives highest reward. In this paper, we address this very important problem on how to learn the higher level optimal strategies that enable competitive behaviour with humans in such an interactive game setting. This paper presents a novel technique to learn a higher level strategy for the game of table tennis using P-Q Learning (a mixture of Pavlovian learning and Q-learning) to learn a parameterized policy. The cooperative learning framework of Kohenon Self Organizing Map (KSOM) along with Replay Memory is employed for faster strategy learning in this short horizon problem. The strategy is learnt in simulation, using a simulated human opponent and an ideal robot that can perform hitting motion in its workspace accurately. We show that our method is able to improve the average received reward significantly in comparison to the other state-of-the-art methods.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果