Neural text generation from structured data with application to the biography domain

Y Gao, Y Xiong, X Gao, K Jia, J Pan, Y Bi, Y Dai… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) demonstrate powerful capabilities, but they still face
challenges in practical applications, such as hallucinations, slow knowledge updates, and …

被引用次数：585 相关文章所有 4 个版本

[PDF] arxiv.org

Survey of hallucination in natural language generation

Z Ji, N Lee, R Frieske, T Yu, D Su, Y Xu, E Ishii… - ACM Computing …, 2023 - dl.acm.org

Natural Language Generation (NLG) has improved exponentially in recent years thanks to
the development of sequence-to-sequence deep learning technologies such as Transformer …

被引用次数：2436 相关文章所有 7 个版本

[PDF] arxiv.org

Documenting large webtext corpora: A case study on the colossal clean crawled corpus

J Dodge, M Sap, A Marasović, W Agnew… - arXiv preprint arXiv …, 2021 - arxiv.org

Large language models have led to remarkable progress on many NLP tasks, and
researchers are turning to ever-larger text corpora to train them. Some of the largest corpora …

被引用次数：381 相关文章所有 8 个版本

[PDF] arxiv.org

Survey on factuality in large language models: Knowledge, retrieval and domain-specificity

C Wang, X Liu, Y Yue, X Tang, T Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

This survey addresses the crucial issue of factuality in Large Language Models (LLMs). As
LLMs find applications across diverse domains, the reliability and accuracy of their outputs …

被引用次数：128 相关文章所有 2 个版本

[PDF] arxiv.org

On faithfulness and factuality in abstractive summarization

J Maynez, S Narayan, B Bohnet… - arXiv preprint arXiv …, 2020 - arxiv.org

It is well known that the standard likelihood training and approximate decoding objectives in
neural text generation models lead to less human-like responses for open-ended tasks such …

被引用次数：1085 相关文章所有 6 个版本

[PDF] jair.org Full View

Repairing the cracked foundation: A survey of obstacles in evaluation practices for generated text

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：133 相关文章所有 6 个版本

[PDF] arxiv.org

ToTTo: A controlled table-to-text generation dataset

AP Parikh, X Wang, S Gehrmann, M Faruqui… - arXiv preprint arXiv …, 2020 - arxiv.org

We present ToTTo, an open-domain English table-to-text dataset with over 120,000 training
examples that proposes a controlled generation task: given a Wikipedia table and a set of …

被引用次数：340 相关文章所有 6 个版本

[PDF] arxiv.org

Knowledge graph based synthetic corpus generation for knowledge-enhanced language model pre-training

O Agarwal, H Ge, S Shakeri, R Al-Rfou - arXiv preprint arXiv:2010.12688, 2020 - arxiv.org

Prior work on Data-To-Text Generation, the task of converting knowledge graph (KG) triples
into natural text, focused on domain-specific benchmark datasets. In this paper, however, we …

被引用次数：213 相关文章所有 3 个版本

[PDF] arxiv.org

Embers of autoregression: Understanding large language models through the problem they are trained to solve

RT McCoy, S Yao, D Friedman, M Hardy… - arXiv preprint arXiv …, 2023 - arxiv.org

The widespread adoption of large language models (LLMs) makes it important to recognize
their strengths and limitations. We argue that in order to develop a holistic understanding of …

被引用次数：78 相关文章所有 3 个版本

[PDF] mlr.press

Exploring the benefits of training expert language models over instruction tuning

J Jang, S Kim, S Ye, D Kim… - International …, 2023 - proceedings.mlr.press

Abstract Recently, Language Models (LMs) instruction-tuned on multiple tasks, also known
as multitask-prompted fine-tuning (MT), have shown capabilities to generalize to unseen …

被引用次数：49 相关文章所有 7 个版本

高级搜索

QQ 群