Policy finetuning: Bridging sample-efficient offline and online reinforcement learning

T Xie, N Jiang, H Wang, C Xiong… - Advances in neural …, 2021 - proceedings.neurips.cc
Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes

H Zhong, T Zhang - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The proximal policy optimization (PPO) algorithm stands as one of the most prosperous
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc
In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

Sample-efficient reinforcement learning with loglog (t) switching cost

D Qiao, M Yin, M Min, YX Wang - … Conference on Machine …, 2022 - proceedings.mlr.press
We study the problem of reinforcement learning (RL) with low (policy) switching cost {—} a
problem well-motivated by real-life RL applications in which deployments of new policies …

Learn to match with no regret: Reinforcement learning in markov matching markets

Y Min, T Wang, R Xu, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …

Sequential information design: Markov persuasion process and its efficient reinforcement learning

J Wu, Z Zhang, Z Feng, Z Wang, Z Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
In today's economy, it becomes important for Internet platforms to consider the sequential
information design problem to align its long term interest with incentives of the gig service …

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press
We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc
We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …

Online sub-sampling for reinforcement learning with general function approximation

D Kong, R Salakhutdinov, R Wang, LF Yang - arXiv preprint arXiv …, 2021 - arxiv.org
Most of the existing works for reinforcement learning (RL) with general function
approximation (FA) focus on understanding the statistical complexity or regret bounds …