Recent work has demonstrated that cutting-edge advances in deep reinforcement learning (DRL) may be leveraged to empower wireless devices with the much-needed ability to "sense" current spectrum and network conditions and "react" in real time by either exploiting known optimal actions or exploring new actions. Yet, understanding whether real-time DRL can be at all applied in the resource-challenged embedded IoT domain, as well as designing IoT-tailored DRL systems and architectures, still remains mostly uncharted territory. This paper bridges the existing gap between the extensive theoretical research on wireless DRL and its system-level applications by presenting Deep Wireless Embedded Reinforcement Learning (DeepWiERL), a general-purpose, hybrid software/hardware DRL framework specifically tailored for embedded IoT wireless devices. DeepWiERL provides abstractions, circuits, software structures and drivers to support the training and real-time execution of state-of-the-art DRL algorithms on the device's hardware. Moreover, DeepWiERL includes a novel supervised DRL model selection and bootstrap (S-DMSB) technique that leverages transfer learning and high-level synthesis (HLS) circuit design to orchestrate a neural network architecture that satisfies hardware and application throughput constraints and speeds up the DRL algorithm convergence. Experimental evaluation on a fully-custom software-defined radio testbed (i) proves for the first time the feasibility of real-time DRL-based algorithms on a real-world wireless platform with multiple channel conditions; (ii) shows that DeepWiERL supports 16x data rate and consumes 14x less energy than a software-based implementation, and (iii) indicates that S-DMSB may improve the DRL convergence time by 6x and increase the obtained reward by 45% if prior channel knowledge is available.