News summarization and evaluation in the era of gpt-3

T Goyal, JJ Li, G Durrett - arXiv preprint arXiv:2209.12356, 2022 - arxiv.org
The recent success of zero-and few-shot prompting with models like GPT-3 has led to a
paradigm shift in NLP research. In this paper, we study its impact on text summarization …

Summeval: Re-evaluating summarization evaluation

AR Fabbri, W Kryściński, B McCann, C Xiong… - Transactions of the …, 2021 - direct.mit.edu
The scarcity of comprehensive up-to-date studies on evaluation metrics for text
summarization and the lack of consensus regarding evaluation protocols continue to inhibit …

On faithfulness and factuality in abstractive summarization

J Maynez, S Narayan, B Bohnet… - arXiv preprint arXiv …, 2020 - arxiv.org
It is well known that the standard likelihood training and approximate decoding objectives in
neural text generation models lead to less human-like responses for open-ended tasks such …

Revisiting the gold standard: Grounding summarization evaluation with robust human evaluation

Y Liu, AR Fabbri, P Liu, Y Zhao, L Nan, R Han… - arXiv preprint arXiv …, 2022 - arxiv.org
Human evaluation is the foundation upon which the evaluation of both summarization
systems and automatic metrics rests. However, existing human evaluation studies for …

LongEval: Guidelines for human evaluation of faithfulness in long-form summarization

K Krishna, E Bransom, B Kuehl, M Iyyer… - arXiv preprint arXiv …, 2023 - arxiv.org
While human evaluation remains best practice for accurately judging the faithfulness of
automatically-generated summaries, few solutions exist to address the increased difficulty …

Summary explorer: Visualizing the state of the art in text summarization

S Syed, T Yousef, K Al-Khatib, S Jänicke… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper introduces Summary Explorer, a new tool to support the manual inspection of text
summarization systems by compiling the outputs of 55~ state-of-the-art single document …

Generating representative headlines for news stories

X Gu, Y Mao, J Han, J Liu, Y Wu, C Yu… - Proceedings of The …, 2020 - dl.acm.org
Millions of news articles are published online every day, which can be overwhelming for
readers to follow. Grouping articles that are reporting the same event into news stories is a …

An investigation of evaluation methods in automatic medical note generation

AB Abacha, W Yim, G Michalopoulos… - Findings of the …, 2023 - aclanthology.org
Recent studies on automatic note generation have shown that doctors can save significant
amounts of time when using automatic clinical note generation (Knoll et al., 2022) …

Ffci: A framework for interpretable automatic evaluation of summarization

F Koto, T Baldwin, JH Lau - Journal of Artificial Intelligence Research, 2022 - jair.org
In this paper, we propose FFCI, a framework for fine-grained summarization evaluation that
comprises four elements: faithfulness (degree of factual consistency with the source), focus …

An investigation of evaluation metrics for automated medical note generation

AB Abacha, W Yim, G Michalopoulos, T Lin - arXiv preprint arXiv …, 2023 - arxiv.org
Recent studies on automatic note generation have shown that doctors can save significant
amounts of time when using automatic clinical note generation (Knoll et al., 2022) …