Provable and practical: Efficient exploration in reinforcement learning via langevin monte carlo

H Ishfaq, Q Lan, P Xu, AR Mahmood, D Precup… - arXiv preprint arXiv …, 2023 - arxiv.org
We present a scalable and effective exploration strategy based on Thompson sampling for
reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling …

Sample-Efficiency in Multi-Batch Reinforcement Learning: The Need for Dimension-Dependent Adaptivity

E Johnson, C Pike-Burke, P Rebeschini - arXiv preprint arXiv:2310.01616, 2023 - arxiv.org
We theoretically explore the relationship between sample-efficiency and adaptivity in
reinforcement learning. An algorithm is sample-efficient if it uses a number of queries $ n …