Complex and interactive robot manipulation skills such as playing a game of table tennis against a human opponent is a multifaceted challenge and a novel problem. Accurate dynamic trajectory generation in such dynamic situations and an appropriate controller in order to respond to the incoming table tennis ball from the opponent is only a prerequisite to win the game. Decision making is a major part of an intelligent robot and a policy is needed to choose and execute the action which receives highest reward. In this paper, we address this very important problem on how to learn the higher level optimal strategies that enable competitive behaviour with humans in such an interactive game setting. This paper presents a novel technique to learn a higher level strategy for the game of table tennis using P-Q Learning (a mixture of Pavlovian learning and Q-learning) to learn a parameterized policy. The cooperative learning framework of Kohenon Self Organizing Map (KSOM) along with Replay Memory is employed for faster strategy learning in this short horizon problem. The strategy is learnt in simulation, using a simulated human opponent and an ideal robot that can perform hitting motion in its workspace accurately. We show that our method is able to improve the average received reward significantly in comparison to the other state-of-the-art methods.