Exploring robustness of machine translation metrics: A study of twenty-two automatic metrics...

C Leiter, P Lertvittayakumjorn, M Fomicheva… - Journal of Machine …, 2024 - jmlr.org

Unlike classical lexical overlap metrics such as BLEU, most current evaluation metrics for
machine translation (for example, COMET or BERTScore) are based on black-box large …

被引用次数：10 相关文章所有 5 个版本

[PDF] mit.edu

Machine translation meta evaluation through translation accuracy challenge sets

N Moghe, A Fazla, C Amrhein, T Kocmi… - Computational …, 2024 - direct.mit.edu

Recent machine translation (MT) metrics calibrate their effectiveness by correlating with
human judgment. However, these results are often obtained by averaging predictions across …

被引用次数：5 相关文章所有 3 个版本

[PDF] aclanthology.org

Metric score landscape challenge (MSLC23): Understanding metrics' performance on a wider landscape of translation quality

C Lo, S Larkin, R Knowles - … of the Eighth Conference on Machine …, 2023 - aclanthology.org

Abstract The Metric Score Landscape Challenge (MSLC23) dataset aims to gain insight into
metric scores on a broader/wider landscape of machine translation (MT) quality. It provides a …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

ACES: Translation accuracy challenge sets at WMT 2023

C Amrhein, N Moghe, L Guillou - arXiv preprint arXiv:2311.01153, 2023 - arxiv.org

We benchmark the performance of segmentlevel metrics submitted to WMT 2023 using the
ACES Challenge Set (Amrhein et al., 2022). The challenge set consists of 36K examples …

被引用次数：4 相关文章所有 7 个版本

[PDF] arxiv.org

BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation

T Glushkova, C Zerva, AFT Martins - arXiv preprint arXiv:2305.19144, 2023 - arxiv.org

Although neural-based machine translation evaluation metrics, such as COMET or BLEURT,
have achieved strong correlations with human judgements, they are sometimes unreliable in …

被引用次数：6 相关文章所有 4 个版本

[PDF] aclanthology.org

Multifaceted Challenge Set for Evaluating Machine Translation Performance

X Chen, D Wei, Z Wu, T Zhu, H Shang, Z Li… - Proceedings of the …, 2023 - aclanthology.org

Abstract Machine Translation Evaluation is critical to Machine Translation research, as the
evaluation results reflect the effectiveness of training strategies. As a result, a fair and …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Robustness Tests for Automatic Machine Translation Metrics with Adversarial Attacks

Y Huang, T Baldwin - arXiv preprint arXiv:2311.00508, 2023 - arxiv.org

We investigate MT evaluation metric performance on adversarially-synthesized texts, to
shed light on metric robustness. We experiment with word-and character-level attacks on …

被引用次数：1 相关文章所有 4 个版本

[PDF] aclanthology.org

Pulling Out All The Full Stops: Punctuation Sensitivity in Neural Machine Translation and Evaluation

P Jwalapuram - Findings of the Association for Computational …, 2023 - aclanthology.org

Much of the work testing machine translation systems for robustness and sensitivity has
been adversarial or tended towards testing noisy input such as spelling errors, or non …

Segment-level evaluation of machine translation metrics

N Moghe - 2024 - era.ed.ac.uk

Most metrics evaluating Machine Translation (MT) claim their effectiveness by demonstrating
their ability to distinguish the quality of different MT systems over a large corpus (system …

[PDF] uzh.ch

[PDF][PDF] Evaluation of Pre-trained Metrics and ChatGPT as Document-level Machine Translation Metrics

N Bleiker - cl.uzh.ch

Automatic evaluation metrics play an important role in the development and optimization of
machine translation (MT) systems as they are the main method used for evaluating and …

被引用次数：1 相关文章

高级搜索

QQ 群