A snapshot of NLG evaluation practices 2005-2014

H Zhang, H Song, S Li, M Zhou, D Song - ACM Computing Surveys, 2023 - dl.acm.org

Controllable Text Generation (CTG) is an emerging area in the field of natural language
generation (NLG). It is regarded as crucial for the development of advanced text generation …

被引用次数：340 相关文章所有 3 个版本

[PDF] jair.org Full View

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：157 相关文章所有 6 个版本

[PDF] arxiv.org

Neural text summarization: A critical evaluation

W Kryściński, NS Keskar, B McCann, C Xiong… - arXiv preprint arXiv …, 2019 - arxiv.org

Text summarization aims at compressing long documents into a shorter form that conveys
the most important parts of the original document. Despite increased interest in the …

被引用次数：431 相关文章所有 4 个版本

[PDF] hw.ac.uk

Twenty years of confusion in human evaluation: NLG needs evaluation sheets and standardised definitions

DM Howcroft, A Belz, M Clinciu… - 13th International …, 2020 - researchportal.hw.ac.uk

Human assessment remains the most trusted form of evaluation in NLG, but highly diverse
approaches and a proliferation of different quality criteria used by researchers make it …

被引用次数：222 相关文章所有 19 个版本

[PDF] arxiv.org

Why we need new evaluation metrics for NLG

J Novikova, O Dušek, AC Curry, V Rieser - arXiv preprint arXiv …, 2017 - arxiv.org

The majority of NLG evaluation relies on automatic metrics, such as BLEU. In this paper, we
motivate the need for novel, system-and data-independent automatic evaluation methods …

被引用次数：580 相关文章所有 12 个版本

[HTML] sciencedirect.com

[HTML][HTML] Human evaluation of automatically generated text: Current trends and best practice guidelines

C van der Lee, A Gatt, E van Miltenburg… - Computer Speech & …, 2021 - Elsevier

Currently, there is little agreement as to how Natural Language Generation (NLG) systems
should be evaluated, with a particularly high degree of variation in the way that human …

被引用次数：209 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] Evaluating the state-of-the-art of end-to-end natural language generation: The e2e nlg challenge

O Dušek, J Novikova, V Rieser - Computer Speech & Language, 2020 - Elsevier

This paper provides a comprehensive analysis of the first shared task on End-to-End Natural
Language Generation (NLG) and identifies avenues for future research based on the results …

被引用次数：265 相关文章所有 9 个版本

[PDF] arxiv.org

RankME: Reliable human ratings for natural language generation

J Novikova, O Dušek, V Rieser - arXiv preprint arXiv:1803.05928, 2018 - arxiv.org

Human evaluation for natural language generation (NLG) often suffers from inconsistent
user ratings. While previous research tends to attribute this problem to individual user …

被引用次数：123 相关文章所有 9 个版本

[PDF] arxiv.org

A study of automatic metrics for the evaluation of natural language explanations

M Clinciu, A Eshghi, H Hastie - arXiv preprint arXiv:2103.08545, 2021 - arxiv.org

As transparency becomes key for robotics and AI, it will be necessary to evaluate the
methods through which transparency is provided, including automatically generated natural …

被引用次数：61 相关文章所有 9 个版本

[PDF] arxiv.org

Chain of explanation: New prompting method to generate quality natural language explanation for implicit hate speech

F Huang, H Kwak, J An - Companion Proceedings of the ACM Web …, 2023 - dl.acm.org

Recent studies have exploited advanced generative language models to generate Natural
Language Explanations (NLE) for why a certain text could be hateful. We propose the Chain …

被引用次数：29 相关文章所有 3 个版本

高级搜索

QQ 群