Embrace divergence for richer insights: A multi-document summarization benchmark and a case...

P Laban, AR Fabbri, C Xiong, CS Wu - arXiv preprint arXiv:2407.01370, 2024 - arxiv.org

LLMs and RAG systems are now capable of handling millions of input tokens or more.
However, evaluating the output quality of such systems on long-context tasks remains …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

KH Huang, HP Chan, YR Fung, H Qiu, M Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org

Data visualization in the form of charts plays a pivotal role in data analysis, offering critical
insights and aiding in informed decision-making. Automatic chart understanding has …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Benchmarking generation and evaluation capabilities of large language models for instruction controllable summarization

Y Liu, AR Fabbri, J Chen, Y Zhao, S Han, S Joty… - arXiv preprint arXiv …, 2023 - arxiv.org

While large language models (LLMs) already achieve strong performance on standard
generic summarization benchmarks, their performance on more complex summarization …

被引用次数：35 相关文章所有 3 个版本

[PDF] arxiv.org

Verbosity bias in preference labeling by large language models

K Saito, A Wachi, K Wataoka, Y Akimoto - arXiv preprint arXiv:2310.10076, 2023 - arxiv.org

In recent years, Large Language Models (LLMs) have witnessed a remarkable surge in
prevalence, altering the landscape of natural language processing and machine learning …

被引用次数：45 相关文章所有 3 个版本

[PDF] arxiv.org

Are you sure? challenging llms leads to performance drops in the flipflop experiment

P Laban, L Murakhovs' ka, C Xiong, CS Wu - arXiv preprint arXiv …, 2023 - arxiv.org

The interactive nature of Large Language Models (LLMs) theoretically allows models to
refine and improve their answers, yet systematic analysis of the multi-turn behavior of LLMs …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Fair Abstractive Summarization of Diverse Perspectives

Y Zhang, N Zhang, Y Liu, A Fabbri, J Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

People from different social and demographic groups express diverse perspectives and
conflicting opinions on a broad set of topics such as product reviews, healthcare, law, and …

被引用次数：10 相关文章所有 3 个版本

[PDF] arxiv.org

Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate

K Kim, S Lee, KH Huang, HP Chan, M Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Fact-checking research has extensively explored verification but less so the generation of
natural-language explanations, crucial for user trust. While Large Language Models (LLMs) …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Open-World Evaluation for Retrieving Diverse Perspectives

HT Chen, E Choi - arXiv preprint arXiv:2409.18110, 2024 - arxiv.org

We study retrieving a set of documents that covers various perspectives on a complex and
contentious question (eg, will ChatGPT do more harm than good?). We curate a Benchmark …

被引用次数：1 相关文章

[PDF] arxiv.org

VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

H Qiu, W Hu, ZY Dou, N Peng - arXiv preprint arXiv:2404.13874, 2024 - arxiv.org

Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the
models generate plausible-sounding but factually incorrect outputs, undermining their …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

H Zhang, PS Yu, J Zhang - arXiv preprint arXiv:2406.11289, 2024 - arxiv.org

Text summarization research has undergone several significant transformations with the
advent of deep neural networks, pre-trained language models (PLMs), and recent large …

被引用次数：15 相关文章

高级搜索

QQ 群