There is a consensus in industry and academia that 6G wireless networks will incorporate massive heterogeneous radio access technologies (RATs) in order to cater to the high quality of service (QoS) demands of next-generation wireless applications. Allocation of radio resources in such large-scale networks will pose extra challenges to the traditional resource allocation techniques. Hence, it becomes essential to develop solutions that cope with the high complexity and massive heterogeneity of 6G networks and able to support system scalability. In this context, the advent of AI-based tools is expected to overcome some of the difficulties encountered in traditional radio resource allocation (RRA) approaches. In this paper, we highlight the heterogeneity and scalability issues of 6G networks. Then we describe the challenges of the state-of-the-art RRA methods when applied for 6G heterogeneous networks (HetNets) and show the robustness of the emerging deep reinforcement learning (DRL) methods to address them. Towards this, we propose a DRL-based framework that addresses the multi-RATs assignment and dynamic power allocation in 6G HetNets, and show via simulation its ability to support the massive heterogeneity of 6G networks. Finally, we provide some promising future research directions.