A survey of evaluation metrics used for NLG systems

AB Sai, AK Mohankumar, MM Khapra - ACM Computing Surveys (CSUR …, 2022 - dl.acm.org
In the last few years, a large number of automatic evaluation metrics have been proposed for
evaluating Natural Language Generation (NLG) systems. The rapid development and …

BERT: a review of applications in natural language processing and understanding

MV Koroteev - arXiv preprint arXiv:2103.11943, 2021 - arxiv.org
In this review, we describe the application of one of the most popular deep learning-based
language models-BERT. The paper describes the mechanism of operation of this model, the …

Bertscore: Evaluating text generation with bert

T Zhang, V Kishore, F Wu, KQ Weinberger… - arXiv preprint arXiv …, 2019 - arxiv.org
We propose BERTScore, an automatic evaluation metric for text generation. Analogously to
common metrics, BERTScore computes a similarity score for each token in the candidate …

Results of the WMT19 metrics shared task: Segment-level and strong MT systems pose big challenges

Q Ma, JTZ Wei, O Bojar, Y Graham - 2019 - doras.dcu.ie
This paper presents the results of the WMT19 Metrics Shared Task. Participants were asked
to score the outputs of the translations systems competing in the WMT19 News Translation …

Automatic machine translation evaluation in many languages via zero-shot paraphrasing

B Thompson, M Post - arXiv preprint arXiv:2004.14564, 2020 - arxiv.org
We frame the task of machine translation evaluation as one of scoring machine translation
output with a sequence-to-sequence paraphraser, conditioned on a human reference. We …

[PDF][PDF] Findings of the 2014 workshop on statistical machine translation

O Bojar, C Buck, C Federmann, B Haddow… - Proceedings of the …, 2014 - aclanthology.org
This paper presents the results of the WMT14 shared tasks, which included a standard news
translation task, a separate medical translation task, a task for run-time estimation of …

[PDF][PDF] METEOR: An automatic metric for MT evaluation with improved correlation with human judgments

S Banerjee, A Lavie - Proceedings of the acl workshop on intrinsic …, 2005 - aclanthology.org
We describe METEOR, an automatic metric for machine translation evaluation that is based
on a generalized concept of unigram matching between the machineproduced translation …

[PDF][PDF] Results of the wmt16 metrics shared task

O Bojar, Y Graham, A Kamran… - Proceedings of the First …, 2016 - aclanthology.org
This paper presents the results of the WMT16 Metrics Shared Task. We asked participants of
this task to score the outputs of the MT systems involved in the WMT16 Shared Translation …

[图书][B] Handbook of natural language processing

N Indurkhya, FJ Damerau - 2010 - taylorfrancis.com
The Handbook of Natural Language Processing, Second Edition presents practical tools
and techniques for implementing natural language processing in computer systems. Along …

Overview of the 8th workshop on Asian translation

T Nakazawa, H Nakayama, C Ding… - Proceedings of the …, 2021 - aclanthology.org
This paper presents the results of the shared tasks from the 8th workshop on Asian
translation (WAT2021). For the WAT2021, 28 teams participated in the shared tasks and 24 …