Reading Subtext: Evaluating Large Language Models on Short Story Summarization with Writers

M Subbiah, S Zhang, LB Chilton… - arXiv preprint arXiv …, 2024 - arxiv.org
We evaluate recent Large language Models (LLMs) on the challenging task of summarizing
short stories, which can be lengthy, and include nuanced subtext or scrambled timelines …

FABLES: Evaluating faithfulness and content selection in book-length summarization

Y Kim, Y Chang, M Karpinska, A Garimella… - arXiv preprint arXiv …, 2024 - arxiv.org
While long-context large language models (LLMs) can technically summarize book-length
documents (> 100K tokens), the length and complexity of the documents have so far …

Multi-Modal and Multi-Agent Systems Meet Rationality: A Survey

B Jiang, Y Xie, X Wang, WJ Su, CJ Taylor… - arXiv preprint arXiv …, 2024 - arxiv.org
Rationality is the quality of being guided by reason, characterized by logical thinking and
decision-making that align with evidence and logical rules. This quality is essential for …

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

H Zhang, PS Yu, J Zhang - arXiv preprint arXiv:2406.11289, 2024 - arxiv.org
Text summarization research has undergone several significant transformations with the
advent of deep neural networks, pre-trained language models (PLMs), and recent large …

Learning to Refine with Fine-Grained Natural Language Feedback

M Wadhwa, X Zhao, JJ Li, G Durrett - arXiv preprint arXiv:2407.02397, 2024 - arxiv.org
Recent work has explored the capability of large language models (LLMs) to identify and
correct errors in LLM-generated responses. These refinement approaches frequently …

STORYSUMM: Evaluating Faithfulness in Story Summarization

M Subbiah, F Ladhak, A Mishra, G Adams… - arXiv preprint arXiv …, 2024 - arxiv.org
Human evaluation has been the gold standard for checking faithfulness in abstractive
summarization. However, with a challenging source domain like narrative, multiple …

FIZZ: Factual Inconsistency Detection by Zoom-in Summary and Zoom-out Document

J Yang, S Yoon, B Kim, H Lee - arXiv preprint arXiv:2404.11184, 2024 - arxiv.org
Through the advent of pre-trained language models, there have been notable
advancements in abstractive summarization systems. Simultaneously, a considerable …

Can We Trust the Performance Evaluation of Uncertainty Estimation Methods in Text Summarization?

J He, R Yang, L Yu, C Li, R Jia, F Chen, M Jin… - arXiv preprint arXiv …, 2024 - arxiv.org
Text summarization, a key natural language generation (NLG) task, is vital in various
domains. However, the high cost of inaccurate summaries in risk-critical applications …

Delving into ChatGPT usage in academic writing through excess vocabulary

D Kobak, RG Márquez, EÁ Horvát, J Lause - arXiv preprint arXiv …, 2024 - arxiv.org
Recent large language models (LLMs) can generate and revise text with human-level
performance, and have been widely commercialized in systems like ChatGPT. These …

Detecting Errors through Ensembling Prompts (DEEP): An End-to-End LLM Framework for Detecting Factual Errors

A Chandler, D Surve, H Su - arXiv preprint arXiv:2406.13009, 2024 - arxiv.org
Accurate text summarization is one of the most common and important tasks performed by
Large Language Models, where the costs of human review for an entire document may be …