Variance-aware off-policy evaluation with linear function approximation

Near-optimal offline reinforcement learning with linear representation: Leveraging variance information with pessimism

M Yin, Y Duan, M Wang, YX Wang - arXiv preprint arXiv:2203.05804, 2022 - arxiv.org

Offline reinforcement learning, which seeks to utilize offline/historical data to optimize
sequential decision-making strategies, has gained surging prominence in recent studies …

被引用次数：70 相关文章所有 7 个版本

[PDF] arxiv.org

Nearly minimax optimal offline reinforcement learning with linear function approximation: Single-agent mdp and markov game

W Xiong, H Zhong, C Shi, C Shen, L Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Offline reinforcement learning (RL) aims at learning an optimal strategy using a pre-
collected dataset without further interactions with the environment. While various algorithms …

被引用次数：42 相关文章所有 5 个版本

[PDF] neurips.cc

Posterior sampling with delayed feedback for reinforcement learning with linear function approximation

NL Kuang, M Yin, M Wang… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent studies in reinforcement learning (RL) have made significant progress by leveraging
function approximation to alleviate the sample complexity hurdle for better performance …

被引用次数：4 相关文章所有 5 个版本

[PDF] neurips.cc

Learn to match with no regret: Reinforcement learning in markov matching markets

Y Min, T Wang, R Xu, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study a Markov matching market involving a planner and a set of strategic agents on the
two sides of the market. At each step, the agents are presented with a dynamical context …

被引用次数：25 相关文章所有 9 个版本

[PDF] mlr.press

Learning stochastic shortest path with linear function approximation

Y Min, J He, T Wang, Q Gu - International Conference on …, 2022 - proceedings.mlr.press

We study the stochastic shortest path (SSP) problem in reinforcement learning with linear
function approximation, where the transition kernel is represented as a linear mixture of …

被引用次数：29 相关文章所有 9 个版本

[PDF] arxiv.org

Pessimism in the face of confounders: Provably efficient offline reinforcement learning in partially observable markov decision processes

M Lu, Y Min, Z Wang, Z Yang - arXiv preprint arXiv:2205.13589, 2022 - arxiv.org

We study offline reinforcement learning (RL) in partially observable Markov decision
processes. In particular, we aim to learn an optimal policy from a dataset collected by a …

被引用次数：23 相关文章所有 5 个版本

[PDF] neurips.cc

Provable benefit of multitask representation learning in reinforcement learning

Y Cheng, S Feng, J Yang, H Zhang… - Advances in Neural …, 2022 - proceedings.neurips.cc

As representation learning becomes a powerful technique to reduce sample complexity in
reinforcement learning (RL) in practice, theoretical understanding of its advantage is still …

被引用次数：17 相关文章所有 6 个版本

[PDF] neurips.cc

Noise-adaptive thompson sampling for linear contextual bandits

R Xu, Y Min, T Wang - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Linear contextual bandits represent a fundamental class of models with numerous real-
world applications, and it is critical to develop algorithms that can effectively manage noise …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Sample complexity of nonparametric off-policy evaluation on low-dimensional manifolds using deep networks

X Ji, M Chen, M Wang, T Zhao - arXiv preprint arXiv:2206.02887, 2022 - arxiv.org

We consider the off-policy evaluation problem of reinforcement learning using deep
convolutional neural networks. We analyze the deep fitted Q-evaluation method for …

被引用次数：19 相关文章所有 5 个版本

[PDF] mlr.press

Cascaded gaps: Towards logarithmic regret for risk-sensitive reinforcement learning

Y Fei, R Xu - International Conference on Machine Learning, 2022 - proceedings.mlr.press

In this paper, we study gap-dependent regret guarantees for risk-sensitive reinforcement
learning based on the entropic risk measure. We propose a novel definition of sub-optimality …

被引用次数：12 相关文章所有 2 个版本

高级搜索

QQ 群