Some lessons learned reproducing human evaluation of a data-to-text system

A Belz, C Thomson - Proceedings of the Fourth Workshop on …, 2024 - aclanthology.org

This paper presents an overview of, and the results from, the 2024 Shared Task on
Reproducibility of Evaluations in NLP (ReproNLP'24), following on from three previous …

被引用次数：17 相关文章所有 3 个版本

[PDF] aclanthology.org

(Mostly) Automatic Experiment Execution for Human Evaluations of NLP Systems

C Thomson, A Belz - Proceedings of the 17th International Natural …, 2024 - aclanthology.org

Human evaluation is widely considered the most reliable form of evaluation in NLP, but
recent research has shown it to be riddled with mistakes, often as a result of manual …

被引用次数：2 相关文章

[PDF] aclanthology.org

ReproHum# 0712-01: Reproducing Human Evaluation of Meaning Preservation in Paraphrase Generation

LN Watson, D Gkatzia - Proceedings of the Fourth Workshop on …, 2024 - aclanthology.org

Reproducibility is a cornerstone of scientific research, ensuring the reliability and
generalisability of findings. The ReproNLP Shared Task on Reproducibility of Evaluations in …

被引用次数：5 相关文章所有 4 个版本

[PDF] aclanthology.org

ReproHum# 0712-01: Human Evaluation Reproduction Report for “Hierarchical Sketch Induction for Paraphrase Generation”

M Arvan, N Parde - Proceedings of the Fourth Workshop on …, 2024 - aclanthology.org

Human evaluations are indispensable in the development of NLP systems because they
provide direct insights into how effectively these systems meet real-world needs and …

被引用次数：1 相关文章所有 2 个版本

[PDF] uic.edu

高级搜索

QQ 群