Navigating the metrics maze: Reconciling score magnitudes and accuracies

T Kocmi, V Zouhar, C Federmann, M Post - arXiv preprint arXiv …, 2024 - arxiv.org
Ten years ago a single metric, BLEU, governed progress in machine translation research.
For better or worse, there is no such consensus today, and consequently it is difficult for
researchers to develop and retain the kinds of heuristic intuitions about metric deltas that
drove earlier research and deployment decisions. This paper investigates the" dynamic
range" of a number of modern metrics in an effort to provide a collective understanding of the
meaning of differences in scores both within and among metrics; in other words, we ask …

[引用][C] Navigating the metrics maze: Reconciling score magnitudes and accuracies. arXiv prepring

T Kocmi, V Zouhar, C Federmann, M Post - arXiv preprint arXiv:2401.06760, 2024
以上显示的是最相近的搜索结果。 查看全部搜索结果