Sentence-level Aggregation of Lexical Metrics Correlate Stronger with Human Judgements than Corpus-level Aggregation

P Cavalin, PH Domingues, C Pinhanez - arXiv preprint arXiv:2407.12832, 2024 - arxiv.org
In this paper we show that corpus-level aggregation hinders considerably the capability of
lexical metrics to accurately evaluate machine translation (MT) systems. With empirical …