Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org
Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

TRUE: Re-evaluating factual consistency evaluation

O Honovich, R Aharoni, J Herzig, H Taitelbaum… - arXiv preprint arXiv …, 2022 - arxiv.org
Grounded text generation systems often generate text that contains factual inconsistencies,
hindering their real-world applicability. Automatic factual consistency evaluation may help …

A gold standard methodology for evaluating accuracy in data-to-text systems

C Thomson, E Reiter - arXiv preprint arXiv:2011.03992, 2020 - arxiv.org
Most Natural Language Generation systems need to produce accurate texts. We propose a
methodology for high-quality human evaluation of the accuracy of generated texts, which is …

[PDF][PDF] Fact-Checking Statistical Claims with Tables.

M Saeed, P Papotti - IEEE Data Eng. Bull., 2021 - eurecom.fr
The surge of misinformation poses a serious problem for fact-checkers. Several initiatives for
manual fact-checking have stepped up to combat this ordeal. However, computational …

Text-in-context: Token-level error detection for table-to-text generation

Z Kasner, S Mille, O Dušek - Proceedings of the 14th International …, 2021 - aclanthology.org
We present our Charles-UPF submission for the Shared Task on Evaluating Accuracy in
Generated Texts at INLG 2021. Our system can detect the errors automatically using a …

Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

Z Kasner, O Dušek - arXiv preprint arXiv:2401.10186, 2024 - arxiv.org
We investigate to which extent open large language models (LLMs) can generate coherent
and relevant text from structured data. To prevent bias from benchmarks leaked into LLM …

Shared task in evaluating accuracy: Leveraging pre-annotations in the validation process

N Garneau, L Lamontagne - Proceedings of the 14th International …, 2021 - aclanthology.org
We hereby present our submission to the Shared Task in Evaluating Accuracy at the INLG
2021 Conference. Our evaluation protocol relies on three main components; rules and text …

Grounding NBA matchup summaries

T Nomoto - Proceedings of the 14th International Conference on …, 2021 - aclanthology.org
The present paper summarizes an attempt we made to meet a shared task challenge on
grounding machine-generated summaries of NBA matchups (https://github. com/ehudreiter …

Employing Transformers and Humans for Textual-Claim Verification

M Saeed - 2022 - theses.hal.science
Throughout the last years, there has been a surge in false news spreading across the public.
Despite efforts made in alleviating" fake news", there remains a lot of ordeals when trying to …

[PDF][PDF] A Review of Proposals for Improvements in Evaluation of Natural Language Generation

JP Moilanen - 2023 - helda.helsinki.fi
The evaluation methodology of natural language generation (” NLG”= systems generating
text based on some kind of input) is currently in a state of a hopefully productive confusion …