Needle in a haystack: An analysis of high-agreement workers on mturk for summarization

L Zhang, S Mille, Y Hou, D Deutsch, E Clark… - arXiv preprint arXiv …, 2022 - arxiv.org
To prevent the costly and inefficient use of resources on low-quality annotations, we want a
method for creating a pool of dependable annotators who can effectively complete difficult …

The State and Fate of Summarization Datasets

N Dahan, G Stanovsky - arXiv preprint arXiv:2411.04585, 2024 - arxiv.org
Automatic summarization has consistently attracted attention, due to its versatility and wide
application in various downstream tasks. Despite its popularity, we find that annotation …

Revisiting the gold standard: Grounding summarization evaluation with robust human evaluation

Y Liu, AR Fabbri, P Liu, Y Zhao, L Nan, R Han… - arXiv preprint arXiv …, 2022 - arxiv.org
Human evaluation is the foundation upon which the evaluation of both summarization
systems and automatic metrics rests. However, existing human evaluation studies for …

LongEval: Guidelines for human evaluation of faithfulness in long-form summarization

K Krishna, E Bransom, B Kuehl, M Iyyer… - arXiv preprint arXiv …, 2023 - arxiv.org
While human evaluation remains best practice for accurately judging the faithfulness of
automatically-generated summaries, few solutions exist to address the increased difficulty …

CNewSum: a large-scale summarization dataset with human-annotated adequacy and deducibility level

D Wang, J Chen, X Wu, H Zhou, L Li - … 13–17, 2021, Proceedings, Part I 10, 2021 - Springer
Automatic text summarization aims to produce a brief but crucial summary for the input
documents. Both extractive and abstractive methods have witnessed great success in …

HighRES: Highlight-based reference-less evaluation of summarization

S Narayan, A Vlachos - arXiv preprint arXiv:1906.01361, 2019 - arxiv.org
There has been substantial progress in summarization research enabled by the availability
of novel, often large-scale, datasets and recent advances on neural network-based …

How well do you know your summarization datasets?

P Tejaswin, D Naik, P Liu - arXiv preprint arXiv:2106.11388, 2021 - arxiv.org
State-of-the-art summarization systems are trained and evaluated on massive datasets
scraped from the web. Despite their prevalence, we know very little about the underlying …

UniSumEval: Towards Unified, Fine-Grained, Multi-Dimensional Summarization Evaluation for LLMs

Y Lee, T Yun, J Cai, H Su, H Song - arXiv preprint arXiv:2409.19898, 2024 - arxiv.org
Existing benchmarks for summarization quality evaluation often lack diverse input scenarios,
focus on narrowly defined dimensions (eg, faithfulness), and struggle with subjective and …

Retrievalsum: A retrieval enhanced framework for abstractive summarization

C An, M Zhong, Z Geng, J Yang, X Qiu - arXiv preprint arXiv:2109.07943, 2021 - arxiv.org
Existing summarization systems mostly generate summaries purely relying on the content of
the source document. However, even for humans, we usually need some references or …

TeSum: Human-generated abstractive summarization corpus for Telugu

A Urlana, N Surange, P Baswani, P Ravva… - Proceedings of the …, 2022 - aclanthology.org
Expert human annotation for summarization is definitely an expensive task, and can not be
done on huge scales. But with this work, we show that even with a crowd sourced summary …