having access to a gold standard structure, a learner only receives partial feedback in form
of the loss value of a predicted structure. We present new learning objectives and algorithms
for this interactive scenario, focusing on convergence speed and ease of elicitability of
feedback. We present supervised-to-bandit simulation experiments for several NLP tasks
(machine translation, sequence labeling, text classification), showing that bandit learning …