Reinforcement learning (RL) is an interactive decisionmaking tool with well documented merits for resource allocation tasks in uncertain environments, such as those emerging with Internet-of-Things. While they can attain state-of-the-art performance in several application domains, RL using deep neural networks can be less attractive when the training datasets involved are prohibitively large. Aiming at sample efficiency, this contribution adopts nonparametric value function models using Gaussian processes (GPs). Relying on the temporal-difference update rule, a novel GP-SARSA approach is developed, where the action selection is guided by Thompson sampling to balance exploration and exploitation. Targeting also computational scalability, the advocated approach leverages random features that replace GP-SARSA's nonparametric function learning with a parametric approximate model. Adaptation to unknown dynamics is accomplished through an ensemble (E) of GP-SARSA learners, whose weights are updated in a data-driven fashion. Performance of the proposed (E)GP-SARSA is evaluated on a practical resource allocation problem.