Summary of a haystack: A challenge to long-context llms and rag systems

P Laban, AR Fabbri, C Xiong, CS Wu - arXiv preprint arXiv:2407.01370, 2024 - arxiv.org
LLMs and RAG systems are now capable of handling millions of input tokens or more.
However, evaluating the output quality of such systems on long-context tasks remains …

From Pixels to Insights: A Survey on Automatic Chart Understanding in the Era of Large Foundation Models

KH Huang, HP Chan, YR Fung, H Qiu, M Zhou… - arXiv preprint arXiv …, 2024 - arxiv.org
Data visualization in the form of charts plays a pivotal role in data analysis, offering critical
insights and aiding in informed decision-making. Automatic chart understanding has …

Benchmarking generation and evaluation capabilities of large language models for instruction controllable summarization

Y Liu, AR Fabbri, J Chen, Y Zhao, S Han, S Joty… - arXiv preprint arXiv …, 2023 - arxiv.org
While large language models (LLMs) already achieve strong performance on standard
generic summarization benchmarks, their performance on more complex summarization …

Verbosity bias in preference labeling by large language models

K Saito, A Wachi, K Wataoka, Y Akimoto - arXiv preprint arXiv:2310.10076, 2023 - arxiv.org
In recent years, Large Language Models (LLMs) have witnessed a remarkable surge in
prevalence, altering the landscape of natural language processing and machine learning …

Are you sure? challenging llms leads to performance drops in the flipflop experiment

P Laban, L Murakhovs' ka, C Xiong, CS Wu - arXiv preprint arXiv …, 2023 - arxiv.org
The interactive nature of Large Language Models (LLMs) theoretically allows models to
refine and improve their answers, yet systematic analysis of the multi-turn behavior of LLMs …

Fair Abstractive Summarization of Diverse Perspectives

Y Zhang, N Zhang, Y Liu, A Fabbri, J Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
People from different social and demographic groups express diverse perspectives and
conflicting opinions on a broad set of topics such as product reviews, healthcare, law, and …

Can LLMs Produce Faithful Explanations For Fact-checking? Towards Faithful Explainable Fact-Checking via Multi-Agent Debate

K Kim, S Lee, KH Huang, HP Chan, M Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Fact-checking research has extensively explored verification but less so the generation of
natural-language explanations, crucial for user trust. While Large Language Models (LLMs) …

Open-World Evaluation for Retrieving Diverse Perspectives

HT Chen, E Choi - arXiv preprint arXiv:2409.18110, 2024 - arxiv.org
We study retrieving a set of documents that covers various perspectives on a complex and
contentious question (eg, will ChatGPT do more harm than good?). We curate a Benchmark …

VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models

H Qiu, W Hu, ZY Dou, N Peng - arXiv preprint arXiv:2404.13874, 2024 - arxiv.org
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the
models generate plausible-sounding but factually incorrect outputs, undermining their …

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

H Zhang, PS Yu, J Zhang - arXiv preprint arXiv:2406.11289, 2024 - arxiv.org
Text summarization research has undergone several significant transformations with the
advent of deep neural networks, pre-trained language models (PLMs), and recent large …