We study reinforcement learning (RL) with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear …
J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation. For episodic time- inhomogeneous linear Markov decision processes (linear MDPs) whose transition …
Q Cai, Z Yang, C Jin, Z Wang - International Conference on …, 2020 - proceedings.mlr.press
While policy-based reinforcement learning (RL) achieves tremendous successes in practice, it is significantly less understood in theory, especially compared with value-based RL. In …
Two central paradigms have emerged in the reinforcement learning (RL) community: online RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
X Chen, H Zhong, Z Yang, Z Wang… - … on Machine Learning, 2022 - proceedings.mlr.press
We study human-in-the-loop reinforcement learning (RL) with trajectory preferences, where instead of receiving a numeric reward at each step, the RL agent only receives preferences …
The success of reinforcement learning in a variety of challenging sequential decision- making problems has been much discussed, but often ignored in this discussion is the …
Direct policy gradient methods for reinforcement learning are a successful approach for a variety of reasons: they are model free, they directly optimize the performance metric of …
Value function approximation has demonstrated phenomenal empirical success in reinforcement learning (RL). Nevertheless, despite a handful of recent progress on …
Reward-free reinforcement learning (RL) considers the setting where the agent does not have access to a reward function during exploration, but must propose a near-optimal policy …