reinforcement learning (DRL) has recently gained a lot of attention due to its ability to
optimally control the complex behavior of the HVAC system. However, more exploration is
needed on understanding the adaptability challenges that the DRL agent could face during
the deployment phase. Using online learning for such applications is not realistic due to the
long learning period and likely poor comfort control during the learning process …