Learning structured predictors from bandit feedback for interactive NLP- 学术资源搜索

文章

学术资源搜索

[PDF][PDF] Learning structured predictors from bandit feedback for interactive NLP

A Sokolov, J Kreutzer, C Lo… - Proceedings of the 54th …, 2016 - aclanthology.org

Proceedings of the 54th Annual Meeting of the Association for …, 2016•aclanthology.org

Structured prediction from bandit feedback describes a learning scenario where instead of
having access to a gold standard structure, a learner only receives partial feedback in form
of the loss value of a predicted structure. We present new learning objectives and algorithms
for this interactive scenario, focusing on convergence speed and ease of elicitability of
feedback. We present supervised-to-bandit simulation experiments for several NLP tasks
(machine translation, sequence labeling, text classification), showing that bandit learning …

Abstract

Structured prediction from bandit feedback describes a learning scenario where instead of having access to a gold standard structure, a learner only receives partial feedback in form of the loss value of a predicted structure. We present new learning objectives and algorithms for this interactive scenario, focusing on convergence speed and ease of elicitability of feedback. We present supervised-to-bandit simulation experiments for several NLP tasks (machine translation, sequence labeling, text classification), showing that bandit learning from relative preferences eases feedback strength and yields improved empirical convergence.

aclanthology.org

展开收起

被引用次数：39 相关文章所有 6 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

[PDF][PDF] Learning structured predictors from bandit feedback for interactive NLP

引用