Recent machine translation (MT) metrics calibrate their effectiveness by correlating with human judgment. However, these results are often obtained by averaging predictions across …
C Lo, S Larkin, R Knowles - … of the Eighth Conference on Machine …, 2023 - aclanthology.org
Abstract The Metric Score Landscape Challenge (MSLC23) dataset aims to gain insight into metric scores on a broader/wider landscape of machine translation (MT) quality. It provides a …
We benchmark the performance of segmentlevel metrics submitted to WMT 2023 using the ACES Challenge Set (Amrhein et al., 2022). The challenge set consists of 36K examples …
Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in …
Abstract Machine Translation Evaluation is critical to Machine Translation research, as the evaluation results reflect the effectiveness of training strategies. As a result, a fair and …
Y Huang, T Baldwin - arXiv preprint arXiv:2311.00508, 2023 - arxiv.org
We investigate MT evaluation metric performance on adversarially-synthesized texts, to shed light on metric robustness. We experiment with word-and character-level attacks on …
P Jwalapuram - Findings of the Association for Computational …, 2023 - aclanthology.org
Much of the work testing machine translation systems for robustness and sensitivity has been adversarial or tended towards testing noisy input such as spelling errors, or non …
Most metrics evaluating Machine Translation (MT) claim their effectiveness by demonstrating their ability to distinguish the quality of different MT systems over a large corpus (system …
Automatic evaluation metrics play an important role in the development and optimization of machine translation (MT) systems as they are the main method used for evaluating and …