On-policy reinforcement learning via ensemble Gaussian processes with application to resource allocation

KD Polyzos, Q Lu, A Sadeghi… - 2021 55th Asilomar …, 2021 - ieeexplore.ieee.org
2021 55th Asilomar Conference on Signals, Systems, and Computers, 2021ieeexplore.ieee.org
Reinforcement learning (RL) is an interactive decisionmaking tool with well documented
merits for resource allocation tasks in uncertain environments, such as those emerging with
Internet-of-Things. While they can attain state-of-the-art performance in several application
domains, RL using deep neural networks can be less attractive when the training datasets
involved are prohibitively large. Aiming at sample efficiency, this contribution adopts
nonparametric value function models using Gaussian processes (GPs). Relying on the …
Reinforcement learning (RL) is an interactive decisionmaking tool with well documented merits for resource allocation tasks in uncertain environments, such as those emerging with Internet-of-Things. While they can attain state-of-the-art performance in several application domains, RL using deep neural networks can be less attractive when the training datasets involved are prohibitively large. Aiming at sample efficiency, this contribution adopts nonparametric value function models using Gaussian processes (GPs). Relying on the temporal-difference update rule, a novel GP-SARSA approach is developed, where the action selection is guided by Thompson sampling to balance exploration and exploitation. Targeting also computational scalability, the advocated approach leverages random features that replace GP-SARSA's nonparametric function learning with a parametric approximate model. Adaptation to unknown dynamics is accomplished through an ensemble (E) of GP-SARSA learners, whose weights are updated in a data-driven fashion. Performance of the proposed (E)GP-SARSA is evaluated on a practical resource allocation problem.
ieeexplore.ieee.org
以上显示的是最相近的搜索结果。 查看全部搜索结果