Learning structured predictors from bandit feedback for interactive NLP

V Uc-Cetina, N Navarro-Guerrero… - Artificial Intelligence …, 2023 - Springer

In recent years some researchers have explored the use of reinforcement learning (RL)
algorithms as key components in the solution of various natural language processing (NLP) …

被引用次数：118 相关文章所有 12 个版本

[PDF] jmlr.org

A survey of preference-based reinforcement learning methods

C Wirth, R Akrour, G Neumann, J Fürnkranz - Journal of Machine Learning …, 2017 - jmlr.org

Reinforcement learning (RL) techniques optimize the accumulated long-term reward of a
suitably chosen reward function. However, designing such a reward function often requires …

被引用次数：376 相关文章所有 10 个版本

[PDF] arxiv.org

Reinforcement learning for bandit neural machine translation with simulated human feedback

K Nguyen, H Daumé III, J Boyd-Graber - arXiv preprint arXiv:1707.07402, 2017 - arxiv.org

Machine translation is a natural candidate problem for reinforcement learning from human
feedback: users provide quick, dirty ratings on candidate translations to guide a system to …

被引用次数：113 相关文章所有 8 个版本

[PDF] sciencedirect.com

Communicative feedback in language acquisition

M Nikolaus, A Fourtassi - New Ideas in Psychology, 2023 - Elsevier

Children start to communicate and use language in social interactions from a very young
age. This allows them to experiment with their developing linguistic knowledge and receive …

被引用次数：18 相关文章所有 18 个版本

[PDF] arxiv.org

Reliability and learnability of human bandit feedback for sequence-to-sequence reinforcement learning

J Kreutzer, J Uyheng, S Riezler - arXiv preprint arXiv:1805.10627, 2018 - arxiv.org

We present a study on reinforcement learning (RL) from human bandit feedback for
sequence-to-sequence learning, exemplified by the task of bandit neural machine …

被引用次数：67 相关文章所有 6 个版本

[PDF] arxiv.org

Improving machine translation with human feedback: An exploration of quality estimation as a reward model

Z He, X Wang, W Jiao, Z Zhang, R Wang, S Shi… - arXiv preprint arXiv …, 2024 - arxiv.org

Insufficient modeling of human preferences within the reward model is a major obstacle for
leveraging human feedback to improve translation quality. Fortunately, quality estimation …

被引用次数：5 相关文章所有 5 个版本

[PDF] arxiv.org

APRIL: Interactively learning to summarise by combining active preference learning and reinforcement learning

Y Gao, CM Meyer, I Gurevych - arXiv preprint arXiv:1808.09658, 2018 - arxiv.org

We propose a method to perform automatic document summarisation without using
reference summaries. Instead, our method interactively learns from users' preferences. The …

被引用次数：47 相关文章所有 5 个版本

[PDF] arxiv.org

An imitation game for learning semantic parsers from user interaction

Z Yao, Y Tang, W Yih, H Sun, Y Su - arXiv preprint arXiv:2005.00689, 2020 - arxiv.org

Despite the widely successful applications, bootstrapping and fine-tuning semantic parsers
are still a tedious process with challenges such as costly data annotation and privacy risks …

被引用次数：32 相关文章所有 8 个版本

[PDF] arxiv.org

Simulating bandit learning from user feedback for extractive question answering

G Gao, E Choi, Y Artzi - arXiv preprint arXiv:2203.10079, 2022 - arxiv.org

We study learning from user feedback for extractive question answering by simulating
feedback using supervised data. We cast the problem as contextual bandit learning, and …

被引用次数：11 相关文章所有 4 个版本

[PDF] arxiv.org

Bandit structured prediction for neural sequence-to-sequence learning

J Kreutzer, A Sokolov, S Riezler - arXiv preprint arXiv:1704.06497, 2017 - arxiv.org

Bandit structured prediction describes a stochastic optimization framework where learning is
performed from partial feedback. This feedback is received in the form of a task loss …

被引用次数：46 相关文章所有 7 个版本

高级搜索

QQ 群