Nearly minimax optimal offline reinforcement learning with linear function approximation:...

Iterative preference learning from human feedback: Bridging theory and practice for rlhf under kl-constraint

W Xiong, H Dong, C Ye, Z Wang, H Zhong… - … on Machine Learning, 2024 - openreview.net

This paper studies the theoretical framework of the alignment process of generative models
with Reinforcement Learning from Human Feedback (RLHF). We consider a standard …

被引用次数：68 相关文章所有 3 个版本

[PDF] arxiv.org

Dpo meets ppo: Reinforced token optimization for rlhf

H Zhong, G Feng, W Xiong, X Cheng, L Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org

In the classical Reinforcement Learning from Human Feedback (RLHF) framework, Proximal
Policy Optimization (PPO) is employed to learn from sparse, sentence-level rewards--a …

被引用次数：28 相关文章所有 2 个版本

[PDF] neurips.cc

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

被引用次数：18 相关文章所有 7 个版本

[PDF] neurips.cc

Double pessimism is provably efficient for distributionally robust offline reinforcement learning: Generic algorithm and robust partial coverage

J Blanchet, M Lu, T Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We study distributionally robust offline reinforcement learning (RL), which seeks to find an
optimal robust policy purely from an offline dataset that can perform well in perturbed …

被引用次数：32 相关文章所有 7 个版本

[PDF] mlr.press

Breaking the curse of multiagents in a large state space: Rl in markov games with independent linear function approximation

Q Cui, K Zhang, S Du - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press

We propose a new model,\emph {independent linear Markov game}, for multi-agent
reinforcement learning with a large state space and a large number of agents. This is a class …

被引用次数：29 相关文章所有 5 个版本

[PDF] neurips.cc

Policy finetuning in reinforcement learning via design of experiments using offline data

R Zhang, A Zanette - Advances in Neural Information …, 2024 - proceedings.neurips.cc

In some applications of reinforcement learning, a dataset of pre-collected experience is
already availablebut it is also possible to acquire some additional online data to help …

被引用次数：7 相关文章所有 6 个版本

[PDF] arxiv.org

A theoretical analysis of nash learning from human feedback under general kl-regularized preference

C Ye, W Xiong, Y Zhang, N Jiang, T Zhang - arXiv preprint arXiv …, 2024 - arxiv.org

Reinforcement Learning from Human Feedback (RLHF) learns from the preference signal
provided by a probabilistic preference model, which takes a prompt and two responses as …

被引用次数：27 相关文章所有 2 个版本

[PDF] mlr.press

A self-play posterior sampling algorithm for zero-sum markov games

W Xiong, H Zhong, C Shi, C Shen… - … on Machine Learning, 2022 - proceedings.mlr.press

Existing studies on provably efficient algorithms for Markov games (MGs) almost exclusively
build on the “optimism in the face of uncertainty”(OFU) principle. This work focuses on a …

被引用次数：27 相关文章所有 8 个版本

[PDF] arxiv.org

Towards robust offline reinforcement learning under diverse data corruption

R Yang, H Zhong, J Xu, A Zhang, C Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Offline reinforcement learning (RL) presents a promising approach for learning reinforced
policies from offline datasets without the need for costly or unsafe interactions with the …

被引用次数：13 相关文章所有 3 个版本

[PDF] neurips.cc

On sample-efficient offline reinforcement learning: Data diversity, posterior sampling and beyond

T Nguyen-Tang, R Arora - Advances in neural information …, 2024 - proceedings.neurips.cc

We seek to understand what facilitates sample-efficient learning from historical datasets for
sequential decision-making, a problem that is popularly known as offline reinforcement …

被引用次数：7 相关文章所有 7 个版本

高级搜索

QQ 群