- 学术资源搜索

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：158 相关文章所有 6 个版本

[PDF] arxiv.org

TRUE: Re-evaluating factual consistency evaluation

O Honovich, R Aharoni, J Herzig, H Taitelbaum… - arXiv preprint arXiv …, 2022 - arxiv.org

Grounded text generation systems often generate text that contains factual inconsistencies,
hindering their real-world applicability. Automatic factual consistency evaluation may help …

被引用次数：215 相关文章所有 10 个版本

[PDF] arxiv.org

A gold standard methodology for evaluating accuracy in data-to-text systems

C Thomson, E Reiter - arXiv preprint arXiv:2011.03992, 2020 - arxiv.org

Most Natural Language Generation systems need to produce accurate texts. We propose a
methodology for high-quality human evaluation of the accuracy of generated texts, which is …

被引用次数：59 相关文章所有 5 个版本

[PDF] eurecom.fr

[PDF][PDF] Fact-Checking Statistical Claims with Tables.

M Saeed, P Papotti - IEEE Data Eng. Bull., 2021 - eurecom.fr

The surge of misinformation poses a serious problem for fact-checkers. Several initiatives for
manual fact-checking have stepped up to combat this ordeal. However, computational …

被引用次数：10 相关文章所有 4 个版本

[PDF] aclanthology.org

Text-in-context: Token-level error detection for table-to-text generation

Z Kasner, S Mille, O Dušek - Proceedings of the 14th International …, 2021 - aclanthology.org

We present our Charles-UPF submission for the Shared Task on Evaluating Accuracy in
Generated Texts at INLG 2021. Our system can detect the errors automatically using a …

被引用次数：8 相关文章所有 5 个版本

[PDF] arxiv.org

Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

Z Kasner, O Dušek - arXiv preprint arXiv:2401.10186, 2024 - arxiv.org

We investigate to which extent open large language models (LLMs) can generate coherent
and relevant text from structured data. To prevent bias from benchmarks leaked into LLM …

被引用次数：7 相关文章所有 2 个版本

[PDF] aclanthology.org

Shared task in evaluating accuracy: Leveraging pre-annotations in the validation process

N Garneau, L Lamontagne - Proceedings of the 14th International …, 2021 - aclanthology.org

We hereby present our submission to the Shared Task in Evaluating Accuracy at the INLG
2021 Conference. Our evaluation protocol relies on three main components; rules and text …

被引用次数：5 相关文章所有 3 个版本

[PDF] aclanthology.org

Grounding NBA matchup summaries

T Nomoto - Proceedings of the 14th International Conference on …, 2021 - aclanthology.org

The present paper summarizes an attempt we made to meet a shared task challenge on
grounding machine-generated summaries of NBA matchups (https://github. com/ehudreiter …

被引用次数：2 相关文章所有 3 个版本

[PDF] hal.science

Employing Transformers and Humans for Textual-Claim Verification

M Saeed - 2022 - theses.hal.science

Throughout the last years, there has been a surge in false news spreading across the public.
Despite efforts made in alleviating" fake news", there remains a lot of ordeals when trying to …

被引用次数：1 相关文章所有 5 个版本

[PDF] helsinki.fi

[PDF][PDF] A Review of Proposals for Improvements in Evaluation of Natural Language Generation

JP Moilanen - 2023 - helda.helsinki.fi

The evaluation methodology of natural language generation (” NLG”= systems generating
text based on some kind of input) is currently in a state of a hopefully productive confusion …

高级搜索

QQ 群

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

TRUE: Re-evaluating factual consistency evaluation

A gold standard methodology for evaluating accuracy in data-to-text systems

[PDF][PDF] Fact-Checking Statistical Claims with Tables.

Text-in-context: Token-level error detection for table-to-text generation

Beyond Reference-Based Metrics: Analyzing Behaviors of Open LLMs on Data-to-Text Generation

Shared task in evaluating accuracy: Leveraging pre-annotations in the validation process

Grounding NBA matchup summaries

Employing Transformers and Humans for Textual-Claim Verification

[PDF][PDF] A Review of Proposals for Improvements in Evaluation of Natural Language Generation

引用