A Survey of Confidence Estimation and Calibration in Large Language Models

J Geng, F Cai, Y Wang, H Koeppl… - Proceedings of the …, 2024 - aclanthology.org
Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of tasks in various domains. Despite their impressive performance, they can be …

Factcheck-bench: Fine-grained evaluation benchmark for automatic fact-checkers

Y Wang, RG Reddy, Z Mujahid, A Arora… - Findings of the …, 2024 - aclanthology.org
The increased use of large language models (LLMs) across a variety of real-world
applications calls for mechanisms to verify the factual accuracy of their outputs. In this work …

Adapting fake news detection to the era of large language models

J Su, C Cardie, P Nakov - arXiv preprint arXiv:2311.04917, 2023 - arxiv.org
In the age of large language models (LLMs) and the widespread adoption of AI-driven
content creation, the landscape of information dissemination has witnessed a paradigm shift …

Large language models as reliable knowledge bases?

D Zheng, M Lapata, JZ Pan - arXiv preprint arXiv:2407.13578, 2024 - arxiv.org
The NLP community has recently shown a growing interest in leveraging Large Language
Models (LLMs) for knowledge-intensive tasks, viewing LLMs as potential knowledge bases …

Generative large language models in automated fact-checking: A survey

I Vykopal, M Pikuliak, S Ostermann, M Šimko - arXiv preprint arXiv …, 2024 - arxiv.org
The dissemination of false information on online platforms presents a serious societal
challenge. While manual fact-checking remains crucial, Large Language Models (LLMs) …

" Sorry, Come Again?" Prompting--Enhancing Comprehension and Diminishing Hallucination with [PAUSE]-injected Optimal Paraphrasing

V Rawte, SM Tonmoy, SM Zaman, P Priya… - arXiv preprint arXiv …, 2024 - arxiv.org
Hallucination has emerged as the most vulnerable aspect of contemporary Large Language
Models (LLMs). In this paper, we introduce the Sorry, Come Again (SCA) prompting, aimed …

Medico: Towards Hallucination Detection and Correction with Multi-source Evidence Fusion

X Zhao, J Yu, Z Liu, J Wang, D Li, Y Chen, B Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
As we all know, hallucinations prevail in Large Language Models (LLMs), where the
generated content is coherent but factually incorrect, which inflicts a heavy blow on the …

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

Y Wang, M Wang, H Iqbal, G Georgiev, J Geng… - arXiv preprint arXiv …, 2024 - arxiv.org
The increased use of large language models (LLMs) across a variety of real-world
applications calls for mechanisms to verify the factual accuracy of their outputs. Difficulties lie …

OpenFactCheck: A Unified Framework for Factuality Evaluation of LLMs

H Iqbal, Y Wang, M Wang, G Georgiev, J Geng… - arXiv preprint arXiv …, 2024 - arxiv.org
The increased use of large language models (LLMs) across a variety of real-world
applications calls for automatic tools to check the factual accuracy of their outputs, as LLMs …

Loki: An Open-Source Tool for Fact Verification

H Li, X Han, H Wang, Y Wang, M Wang, R Xing… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce Loki, an open-source tool designed to address the growing problem of
misinformation. Loki adopts a human-centered approach, striking a balance between the …