Constrained DRL for Energy Efficiency Optimization in RSMA-Based Integrated Satellite Terrestrial Network

Q Zhang, L Zhu, Y Chen, S Jiang - Sensors, 2023 - mdpi.com
Q Zhang, L Zhu, Y Chen, S Jiang
Sensors, 2023mdpi.com
To accommodate the requirements of extensive coverage and ubiquitous connectivity in 6G
communications, satellite plays a more significant role in it. As users and devices explosively
grow, new multiple access technologies are called for. Among the new candidates, rate
splitting multiple access (RSMA) shows great potential. Since satellites are power-limited,
we investigate the energy-efficient resource allocation in the integrated satellite terrestrial
network (ISTN)-adopting RSMA scheme in this paper. However, this non-convex problem is …
To accommodate the requirements of extensive coverage and ubiquitous connectivity in 6G communications, satellite plays a more significant role in it. As users and devices explosively grow, new multiple access technologies are called for. Among the new candidates, rate splitting multiple access (RSMA) shows great potential. Since satellites are power-limited, we investigate the energy-efficient resource allocation in the integrated satellite terrestrial network (ISTN)-adopting RSMA scheme in this paper. However, this non-convex problem is challenging to solve using conventional model-based methods. Because this optimization task has a quality of service (QoS) requirement and continuous action/state space, we propose to use constrained soft actor-critic (SAC) to tackle it. This policy-gradient algorithm incorporates the Lagrangian relaxation technique to convert the original constrained problem into a penalized unconstrained one. The reward is maximized while the requirements are satisfied. Moreover, the learning process is time-consuming and unnecessary when little changes in the network. So, an on–off mechanism is introduced to avoid this situation. By calculating the difference between the current state and the last one, the system will decide to learn a new action or take the last one. The simulation results show that the proposed algorithm can outperform other benchmark algorithms in terms of energy efficiency while satisfying the QoS constraint. In addition, the time consumption is lowered because of the on–off design.
MDPI
以上显示的是最相近的搜索结果。 查看全部搜索结果