Demand response providers (DRPs) play a crucial role in the retail electricity markets as they bridge the gap between the distribution systems operator (DSO) and end participants. The DRPs’ primary objective is to devise a pricing strategy that maximizes their profits, without access to private customer preferences. The vanilla deep reinforcement learning (DRL) paradigm is inapplicable in this context, as they require iterative data collection through interaction with the environment. To address this challenge, we propose an offline DRL-based approach that enables the DRPs to learn pricing strategies from static suboptimal data, without the need for any online interaction. The proposed approach updates DRPs’ Q-value via incorporating a in-distribution behavior decoder and regularization terms to prevent the overestimation caused by out-of-distribution experiences. It is designed to extract better policies from suboptimal data obtained from DRPs’ rule-based strategies. Case studies are conducted to demonstrate that the proposed approach significantly improves DRPs’ profits and outperforms imitation learning, off-policy and Bayesian DRL in various environment settings. Our approach is also shown to be effective in handling different levels of uncertainties in load demands and electricity prices. Finally, our approach provides better pre-trained weights when transferred from offline to online to attain near-optimal strategies.