Neural approaches to conversational AI

J Gao, M Galley, L Li - The 41st international ACM SIGIR conference on …, 2018 - dl.acm.org
This tutorial surveys neural approaches to conversational AI that were developed in the last
few years. We group conversational systems into three categories:(1) question answering …

Bertscore: Evaluating text generation with bert

T Zhang, V Kishore, F Wu, KQ Weinberger… - arXiv preprint arXiv …, 2019 - arxiv.org
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to
common metrics, BERTScore computes a similarity score for each token in the candidate …

Re-evaluating evaluation in text summarization

M Bhandari, P Gour, A Ashfaq, P Liu… - arXiv preprint arXiv …, 2020 - arxiv.org
Automated evaluation metrics as a stand-in for manual evaluation are an essential part of
the development of text-generation tasks such as text summarization. However, while the …

Tangled up in BLEU: Reevaluating the evaluation of automatic machine translation evaluation metrics

N Mathur, T Baldwin, T Cohn - arXiv preprint arXiv:2006.06264, 2020 - arxiv.org
Automatic metrics are fundamental for the development and evaluation of machine
translation systems. Judging whether, and to what extent, automatic metrics concur with the …

Results of the WMT20 metrics shared task

N Mathur, J Wei, M Freitag, Q Ma… - Proceedings of the Fifth …, 2020 - aclanthology.org
This paper presents the results of the WMT20 Metrics Shared Task. Participants were asked
to score the outputs of the translation systems competing in the WMT20 News Translation …

INSTRUCTSCORE: Explainable Text Generation Evaluation with Finegrained Feedback

W Xu, D Wang, L Pan, Z Song, M Freitag… - arXiv preprint arXiv …, 2023 - arxiv.org
Automatically evaluating the quality of language generation is critical. Although recent
learned metrics show high correlation with human judgement, these metrics can not explain …

Results of the WMT19 metrics shared task: Segment-level and strong MT systems pose big challenges

Q Ma, JTZ Wei, O Bojar, Y Graham - 2019 - doras.dcu.ie
This paper presents the results of the WMT19 Metrics Shared Task. Participants were asked
to score the outputs of the translations systems competing in the WMT19 News Translation …

The first multilingual surface realisation shared task (SR'18): Overview and evaluation results

S Mille, A Belz, B Bohnet, Y Graham… - Proceedings of the …, 2018 - aclanthology.org
We report results from the SR'18 Shared Task, a new multilingual surface realisation task
organised as part of the ACL'18 Workshop on Multilingual Surface Realisation. As in its …

Handling divergent reference texts when evaluating table-to-text generation

B Dhingra, M Faruqui, A Parikh, MW Chang… - arXiv preprint arXiv …, 2019 - arxiv.org
Automatically constructed datasets for generating text from semi-structured data (tables),
such as WikiBio, often contain reference texts that diverge from the information in the …

[PDF][PDF] Results of the wmt16 metrics shared task

O Bojar, Y Graham, A Kamran… - Proceedings of the First …, 2016 - aclanthology.org
This paper presents the results of the WMT16 Metrics Shared Task. We asked participants of
this task to score the outputs of the MT systems involved in the WMT16 Shared Translation …