[PDF][PDF] Learning structured predictors from bandit feedback for interactive NLP

A Sokolov, J Kreutzer, C Lo… - Proceedings of the 54th …, 2016 - aclanthology.org
Proceedings of the 54th Annual Meeting of the Association for …, 2016aclanthology.org
Structured prediction from bandit feedback describes a learning scenario where instead of
having access to a gold standard structure, a learner only receives partial feedback in form
of the loss value of a predicted structure. We present new learning objectives and algorithms
for this interactive scenario, focusing on convergence speed and ease of elicitability of
feedback. We present supervised-to-bandit simulation experiments for several NLP tasks
(machine translation, sequence labeling, text classification), showing that bandit learning …
Abstract
Structured prediction from bandit feedback describes a learning scenario where instead of having access to a gold standard structure, a learner only receives partial feedback in form of the loss value of a predicted structure. We present new learning objectives and algorithms for this interactive scenario, focusing on convergence speed and ease of elicitability of feedback. We present supervised-to-bandit simulation experiments for several NLP tasks (machine translation, sequence labeling, text classification), showing that bandit learning from relative preferences eases feedback strength and yields improved empirical convergence.
aclanthology.org
以上显示的是最相近的搜索结果。 查看全部搜索结果