Factual error correction for abstractive summarization models

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM Computing …, 2023 - dl.acm.org

Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

被引用次数：2235 相关文章所有 7 个版本

[PDF] arxiv.org

A survey of natural language generation

C Dong, Y Li, H Gong, M Chen, J Li, Y Shen… - ACM Computing …, 2022 - dl.acm.org

This article offers a comprehensive review of the research on Natural Language Generation
(NLG) over the past two decades, especially in relation to data-to-text generation and text-to …

被引用次数：188 相关文章所有 4 个版本

[PDF] arxiv.org

G-eval: Nlg evaluation using gpt-4 with better human alignment

Y Liu, D Iter, Y Xu, S Wang, R Xu, C Zhu - arXiv preprint arXiv:2303.16634, 2023 - arxiv.org

The quality of texts generated by natural language generation (NLG) systems is hard to
measure automatically. Conventional reference-based metrics, such as BLEU and ROUGE …

被引用次数：564 相关文章所有 4 个版本

[PDF] arxiv.org

Siren's song in the AI ocean: a survey on hallucination in large language models

Y Zhang, Y Li, L Cui, D Cai, L Liu, T Fu… - arXiv preprint arXiv …, 2023 - arxiv.org

While large language models (LLMs) have demonstrated remarkable capabilities across a
range of downstream tasks, a significant concern revolves around their propensity to exhibit …

被引用次数：538 相关文章所有 2 个版本

[PDF] aaai.org

Benchmarking large language models in retrieval-augmented generation

J Chen, H Lin, X Han, L Sun - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org

Retrieval-Augmented Generation (RAG) is a promising approach for mitigating the
hallucination of large language models (LLMs). However, existing research lacks rigorous …

被引用次数：130 相关文章所有 3 个版本

[PDF] mit.edu

SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

P Laban, T Schnabel, PN Bennett… - Transactions of the …, 2022 - direct.mit.edu

In the summarization domain, a key requirement for summaries is to be factually consistent
with the input document. Previous work has found that natural language inference (NLI) …

被引用次数：277 相关文章所有 8 个版本

[PDF] arxiv.org

Towards a unified multi-dimensional evaluator for text generation

M Zhong, Y Liu, D Yin, Y Mao, Y Jiao, P Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

Multi-dimensional evaluation is the dominant paradigm for human evaluation in Natural
Language Generation (NLG), ie, evaluating the generated text from multiple explainable …

被引用次数：154 相关文章所有 6 个版本

[PDF] arxiv.org

Rarr: Researching and revising what language models say, using language models

L Gao, Z Dai, P Pasupat, A Chen, AT Chaganty… - arXiv preprint arXiv …, 2022 - arxiv.org

Language models (LMs) now excel at many tasks such as few-shot learning, question
answering, reasoning, and dialog. However, they sometimes generate unsupported or …

被引用次数：153 相关文章所有 4 个版本

[PDF] jair.org Full View

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：124 相关文章所有 6 个版本

A culturally sensitive test to evaluate nuanced gpt hallucination

TR McIntosh, T Liu, T Susnjak, P Watters… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

The Generative Pre-trained Transformer (GPT) models, renowned for generating human-like
text, occasionally produce “hallucinations”-outputs that diverge from human expectations …

被引用次数：105 相关文章所有 2 个版本

高级搜索

QQ 群