Cyber deception aims to misrepresent the state of the network to mislead the attackers, falsify their reconnaissance conclusions, and deflect them away from their goals. Honeypots serve as decoy devices inside networks that can capture adversaries for monitoring purposes. We propose a two-phase deception approach based on honeypot allocation. In the first phase, we develop a proactive deceptive honeypot allocation policy, the second phase proposes a reactive deception approach that dynamically allocates honeypots according to IDS updates. Considering a practical scenario, the defender partially monitors the adversary’s activities. To this end, we develop our deception approach using a combination of game-theoretic and reinforcement learning models. We cast the problem of reactive deception as a partially observable Markov decision process (POMDP) based on a game-theoretic dynamic model to accommodate the imperfect monitoring of the actions taken by the attacker. We solve this combined partially observable game model using Monte-Carlo tree search to overcome the game model complexity. We give a game-theoretic analysis to explain the attack-defense policies at equilibrium. Finally, we present numerical results to validate the effectiveness of the proposed deception approach.