We study reinforcement learning (RL) with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear …
D Zhou, Q Gu - Advances in neural information processing …, 2022 - proceedings.neurips.cc
Recent studies have shown that episodic reinforcement learning (RL) is not more difficult than bandits, even with a long planning horizon and unknown state transitions. However …
Y Yan, G Li, Y Chen, J Fan - IEEE Transactions on Information …, 2023 - ieeexplore.ieee.org
This paper is concerned with the asynchronous form of Q-learning, which applies a stochastic approximation scheme to Markovian data samples. Motivated by the recent …
Achieving sample efficiency in online episodic reinforcement learning (RL) requires optimally balancing exploration and exploitation. When it comes to a finite-horizon episodic …
Abstract Multi-distribution learning (MDL), which seeks to learn a shared model that minimizes the worst-case risk across $ k $ distinct data distributions, has emerged as a …
Z Zhang, Y Chen, JD Lee… - The Thirty Seventh Annual …, 2024 - proceedings.mlr.press
A central issue lying at the heart of online reinforcement learning (RL) is data efficiency. While a number of recent works achieved asymptotically minimal regret in online RL, the …
This paper studies tabular reinforcement learning (RL) in the hybrid setting, which assumes access to both an offline dataset and online interactions with the unknown environment. A …