Experts, errors, and context: A large-scale study of human evaluation for machine translation

M Freitag, G Foster, D Grangier, V Ratnakar… - Transactions of the …, 2021 - direct.mit.edu
Human evaluation of modern high-quality machine translation systems is a difficult problem,
and there is increasing evidence that inadequate evaluation procedures can lead to …

How to do human evaluation: A brief introduction to user studies in NLP

H Schuff, L Vanderlyn, H Adel, NT Vu - Natural Language …, 2023 - cambridge.org
Many research topics in natural language processing (NLP), such as explanation
generation, dialog modeling, or machine translation, require evaluation that goes beyond …

Adequacy–fluency metrics: Evaluating mt in the continuous space model framework

RE Banchs, LF D'Haro, H Li - IEEE/ACM Transactions on Audio …, 2015 - ieeexplore.ieee.org
This work extends and evaluates a two-dimensional automatic evaluation metric for machine
translation, which is designed to operate at the sentence level. The metric is based on the …

Informative manual evaluation of machine translation output

M Popović - 2020 - doras.dcu.ie
This work proposes a new method for manual evaluation of Machine Translation (MT) output
based on marking actual issues in the translated text. The novelty is that the evaluators are …

Extrinsic evaluation of machine translation metrics

N Moghe, T Sherborne, M Steedman… - arXiv preprint arXiv …, 2022 - arxiv.org
Automatic machine translation (MT) metrics are widely used to distinguish the translation
qualities of machine translation systems across relatively large test sets (system-level …

[PDF][PDF] Deep learning for semantic similarity

A Sanborn, J Skryzalin - CS224d: Deep Learning for Natural …, 2015 - cs224d.stanford.edu
Evaluating the semantic similarity of two sentences is a task central to automated
understanding of natural languages. We discuss the problem of semantic similarity and …

Agree to disagree: Analysis of inter-annotator disagreements in human evaluation of machine translation output

M Popović - Proceedings of the 25th Conference on …, 2021 - aclanthology.org
This work describes an analysis of inter-annotator disagreements in human evaluation of
machine translation output. The errors in the analysed texts were marked by multiple …

Machine Translation with Large Language Models: Prompt Engineering for Persian, English, and Russian Directions

N Pourkamali, SE Sharifi - arXiv preprint arXiv:2401.08429, 2024 - arxiv.org
Generative large language models (LLMs) have demonstrated exceptional proficiency in
various natural language processing (NLP) tasks, including machine translation, question …

[PDF][PDF] Ranking vs. regression in machine translation evaluation

K Duh - Proceedings of the Third Workshop on Statistical …, 2008 - aclanthology.org
Automatic evaluation of machine translation (MT) systems is an important research topic for
the advancement of MT technology. Most automatic evaluation methods proposed to date …

Affective decoding for empathetic response generation

C Zeng, G Chen, C Lin, R Li, Z Chen - arXiv preprint arXiv:2108.08102, 2021 - arxiv.org
Understanding speaker's feelings and producing appropriate responses with emotion
connection is a key communicative skill for empathetic dialogue systems. In this paper, we …