Provably efficient reinforcement learning with linear function approximation under adaptivity...

T Xie, N Jiang, H Wang, C Xiong… - Advances in neural …, 2021 - proceedings.neurips.cc

Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …

被引用次数：162 相关文章所有 9 个版本

[PDF] mlr.press

Nearly minimax optimal reinforcement learning for linear markov decision processes

J He, H Zhao, D Zhou, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

We study reinforcement learning (RL) with linear function approximation. For episodic time-
inhomogeneous linear Markov decision processes (linear MDPs) whose transition …

被引用次数：55 相关文章所有 7 个版本

[PDF] neurips.cc

A theoretical analysis of optimistic proximal policy optimization in linear markov decision processes

H Zhong, T Zhang - Advances in Neural Information …, 2024 - proceedings.neurips.cc

The proximal policy optimization (PPO) algorithm stands as one of the most prosperous
methods in the field of reinforcement learning (RL). Despite its success, the theoretical …

被引用次数：24 相关文章所有 6 个版本

[PDF] neurips.cc

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

被引用次数：4 相关文章所有 6 个版本

[PDF] mlr.press

Sample-efficient reinforcement learning with loglog (t) switching cost

D Qiao, M Yin, M Min, YX Wang - … Conference on Machine …, 2022 - proceedings.mlr.press

We study the problem of reinforcement learning (RL) with low (policy) switching cost {—} a
problem well-motivated by real-life RL applications in which deployments of new policies …

被引用次数：28 相关文章所有 6 个版本

[PDF] neurips.cc

Learn to match with no regret: Reinforcement learning in markov matching markets

Y Min, T Wang, R Xu, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …

被引用次数：31 相关文章所有 9 个版本

[PDF] arxiv.org

Sequential information design: Markov persuasion process and its efficient reinforcement learning

J Wu, Z Zhang, Z Feng, Z Wang, Z Yang… - arXiv preprint arXiv …, 2022 - arxiv.org

In today's economy, it becomes important for Internet platforms to consider the sequential
information design problem to align its long term interest with incentives of the gig service …

被引用次数：37 相关文章所有 9 个版本

[PDF] mlr.press

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press

We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …

被引用次数：33 相关文章所有 9 个版本

[PDF] neurips.cc

Variance-aware off-policy evaluation with linear function approximation

Y Min, T Wang, D Zhou, Q Gu - Advances in neural …, 2021 - proceedings.neurips.cc

We study the off-policy evaluation (OPE) problem in reinforcement learning with linear
function approximation, which aims to estimate the value function of a target policy based on …

被引用次数：39 相关文章所有 11 个版本

[PDF] arxiv.org

Online sub-sampling for reinforcement learning with general function approximation

D Kong, R Salakhutdinov, R Wang, LF Yang - arXiv preprint arXiv …, 2021 - arxiv.org

Most of the existing works for reinforcement learning (RL) with general function
approximation (FA) focus on understanding the statistical complexity or regret bounds …

被引用次数：32 相关文章所有 3 个版本

高级搜索

QQ 群