Unifying PAC and regret: Uniform PAC bounds for episodic reinforcement learning

J Gao, M Galley, L Li - The 41st international ACM SIGIR conference on …, 2018 - dl.acm.org

This tutorial surveys neural approaches to conversational AI that were developed in the last
few years. We group conversational systems into three categories:(1) question answering …

被引用次数：915 相关文章所有 16 个版本

[PDF] wiley.com

Recent advances in reinforcement learning in finance

B Hambly, R Xu, H Yang - Mathematical Finance, 2023 - Wiley Online Library

The rapid changes in the finance industry due to the increasing amount of data have
revolutionized the techniques on data processing and data analysis and brought new …

被引用次数：196 相关文章所有 13 个版本

[PDF] mlr.press

Provably efficient reinforcement learning with linear function approximation

C Jin, Z Yang, Z Wang… - Conference on learning …, 2020 - proceedings.mlr.press

Abstract Modern Reinforcement Learning (RL) is commonly applied to practical problems
with an enormous number of states, where\emph {function approximation} must be deployed …

被引用次数：768 相关文章所有 4 个版本

[PDF] mlr.press

When is partially observable reinforcement learning not scary?

Q Liu, A Chung, C Szepesvári… - Conference on Learning …, 2022 - proceedings.mlr.press

Partial observability is ubiquitous in applications of Reinforcement Learning (RL), in which
agents learn to make a sequence of decisions despite lacking complete information about …

被引用次数：110 相关文章所有 7 个版本

[PDF] neurips.cc

Policy finetuning: Bridging sample-efficient offline and online reinforcement learning

T Xie, N Jiang, H Wang, C Xiong… - Advances in neural …, 2021 - proceedings.neurips.cc

Recent theoretical work studies sample-efficient reinforcement learning (RL) extensively in
two settings: learning interactively in the environment (online RL), or learning from an offline …

被引用次数：180 相关文章所有 9 个版本

[PDF] mlr.press

Model-based reinforcement learning with value-targeted regression

A Ayoub, Z Jia, C Szepesvari… - … on Machine Learning, 2020 - proceedings.mlr.press

This paper studies model-based reinforcement learning (RL) for regret minimization. We
focus on finite-horizon episodic RL where the transition model $ P $ belongs to a known …

被引用次数：350 相关文章所有 8 个版本

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C Jin, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

被引用次数：320 相关文章所有 9 个版本

[PDF] abracadoudou.com

A study on overfitting in deep reinforcement learning

C Zhang, O Vinyals, R Munos, S Bengio - arXiv preprint arXiv:1804.06893, 2018 - arxiv.org

Recent years have witnessed significant progresses in deep Reinforcement Learning (RL).
Empowered with large scale neural networks, carefully designed architectures, novel …

被引用次数：503 相关文章所有 6 个版本

[PDF] mlr.press

Tighter problem-dependent regret bounds in reinforcement learning without domain knowledge using value function bounds

A Zanette, E Brunskill - International Conference on Machine …, 2019 - proceedings.mlr.press

Strong worst-case performance bounds for episodic reinforcement learning exist but
fortunately in practice RL algorithms perform much better than such bounds would predict …

被引用次数：318 相关文章所有 8 个版本

[PDF] mlr.press

A sharp analysis of model-based reinforcement learning with self-play

Q Liu, T Yu, Y Bai, C Jin - International Conference on …, 2021 - proceedings.mlr.press

Abstract Model-based algorithms—algorithms that explore the environment through building
and utilizing an estimated model—are widely used in reinforcement learning practice and …

被引用次数：159 相关文章所有 6 个版本

高级搜索

QQ 群