Answers unite! unsupervised metrics for reinforced summarization models

S Gehrmann, E Clark, T Sellam - Journal of Artificial Intelligence Research, 2023 - jair.org

Abstract Evaluation practices in natural language generation (NLG) have many known flaws,
but improved evaluation approaches are rarely widely adopted. This issue has become …

被引用次数：122 相关文章所有 6 个版本

Deep reinforcement and transfer learning for abstractive text summarization: A review

A Alomari, N Idris, AQM Sabri, I Alsmadi - Computer Speech & Language, 2022 - Elsevier

Abstract Automatic Text Summarization (ATS) is an important area in Natural Language
Processing (NLP) with the goal of shortening a long text into a more compact version by …

被引用次数：106 相关文章所有 4 个版本

[HTML] mit.edu

SummaC: Re-Visiting NLI-based Models for Inconsistency Detection in Summarization

P Laban, T Schnabel, PN Bennett… - Transactions of the …, 2022 - direct.mit.edu

In the summarization domain, a key requirement for summaries is to be factually consistent
with the input document. Previous work has found that natural language inference (NLI) …

被引用次数：252 相关文章所有 8 个版本

[HTML] mit.edu

Summeval: Re-evaluating summarization evaluation

AR Fabbri, W Kryściński, B McCann, C Xiong… - Transactions of the …, 2021 - direct.mit.edu

The scarcity of comprehensive up-to-date studies on evaluation metrics for text
summarization and the lack of consensus regarding evaluation protocols continue to inhibit …

被引用次数：572 相关文章所有 8 个版本

[PDF] arxiv.org

Recursively summarizing books with human feedback

J Wu, L Ouyang, DM Ziegler, N Stiennon… - arXiv preprint arXiv …, 2021 - arxiv.org

A major challenge for scaling machine learning is training models to perform tasks that are
very difficult or time-consuming for humans to evaluate. We present progress on this …

被引用次数：215 相关文章所有 2 个版本

[PDF] salesforceairesearch.com

Ctrl: A conditional transformer language model for controllable generation

NS Keskar, B McCann, LR Varshney, C Xiong… - arXiv preprint arXiv …, 2019 - arxiv.org

Large-scale language models show promising text generation capabilities, but users cannot
easily control particular aspects of the generated text. We release CTRL, a 1.63 billion …

被引用次数：1169 相关文章所有 2 个版本

[PDF] arxiv.org

Asking and answering questions to evaluate the factual consistency of summaries

A Wang, K Cho, M Lewis - arXiv preprint arXiv:2004.04228, 2020 - arxiv.org

Practical applications of abstractive summarization models are limited by frequent factual
inconsistencies with respect to their input. Existing automatic evaluation metrics for …

被引用次数：402 相关文章所有 5 个版本

[PDF] arxiv.org

QuestEval: Summarization asks for fact-based evaluation

T Scialom, PA Dray, P Gallinari, S Lamprier… - arXiv preprint arXiv …, 2021 - arxiv.org

Summarization evaluation remains an open research problem: current metrics such as
ROUGE are known to be limited and to correlate poorly with human judgments. To alleviate …

被引用次数：213 相关文章所有 18 个版本

[PDF] arxiv.org

FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization

E Durmus, H He, M Diab - arXiv preprint arXiv:2005.03754, 2020 - arxiv.org

Neural abstractive summarization models are prone to generate content inconsistent with
the source document, ie unfaithful. Existing automatic metrics do not capture such mistakes …

被引用次数：378 相关文章所有 5 个版本

[PDF] arxiv.org

QAFactEval: Improved QA-based factual consistency evaluation for summarization

AR Fabbri, CS Wu, W Liu, C Xiong - arXiv preprint arXiv:2112.08542, 2021 - arxiv.org

Factual consistency is an essential quality of text summarization models in practical settings.
Existing work in evaluating this dimension can be broadly categorized into two lines of …

被引用次数：132 相关文章所有 3 个版本

高级搜索

QQ 群