A survey of language model confidence estimation and calibration

J Geng, F Cai, Y Wang, H Koeppl, P Nakov… - arXiv preprint arXiv …, 2023 - arxiv.org
Language models (LMs) have demonstrated remarkable capabilities across a wide range of
tasks in various domains. Despite their impressive performance, the reliability of their output …

Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms

M Xiong, Z Hu, X Lu, Y Li, J Fu, J He, B Hooi - arXiv preprint arXiv …, 2023 - arxiv.org
The task of empowering large language models (LLMs) to accurately express their
confidence, referred to as confidence elicitation, is essential in ensuring reliable and …

A Survey of Confidence Estimation and Calibration in Large Language Models

J Geng, F Cai, Y Wang, H Koeppl… - Proceedings of the …, 2024 - aclanthology.org
Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of tasks in various domains. Despite their impressive performance, they can be …

Conformal prediction with large language models for multi-choice question answering

B Kumar, C Lu, G Gupta, A Palepu, D Bellamy… - arXiv preprint arXiv …, 2023 - arxiv.org
As large language models continue to be widely developed, robust uncertainty
quantification techniques will become crucial for their safe deployment in high-stakes …

Large Language Models Help Humans Verify Truthfulness--Except When They Are Convincingly Wrong

C Si, N Goyal, ST Wu, C Zhao, S Feng… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) are increasingly used for accessing information on the web.
Their truthfulness and factuality are thus of great interest. To help users make the right …

Calibrated interpretation: Confidence estimation in semantic parsing

E Stengel-Eskin, B Van Durme - Transactions of the Association for …, 2023 - direct.mit.edu
Sequence generation models are increasingly being used to translate natural language into
programs, ie, to perform executable semantic parsing. The fact that semantic parsing aims to …

Exploring large language models for multi-modal out-of-distribution detection

Y Dai, H Lang, K Zeng, F Huang, Y Li - arXiv preprint arXiv:2310.08027, 2023 - arxiv.org
Out-of-distribution (OOD) detection is essential for reliable and trustworthy machine learning.
Recent multi-modal OOD detection leverages textual information from in-distribution (ID) …

Perceptions of linguistic uncertainty by language models and humans

CG Belem, M Kelly, M Steyvers, S Singh… - arXiv preprint arXiv …, 2024 - arxiv.org
_Uncertainty expressions_ such as" probably" or" highly unlikely" are pervasive in human
language. While prior work has established that there is population-level agreement in terms …

[PDF][PDF] Predict the Next Word:< Humans Exhibit Uncertainty in this Task and Language Models _>

E Ilia, W Aziz - Proceedings of the 18th Conference of the …, 2024 - aclanthology.org
Abstract Language models (LMs) are statistical models trained to assign probability to
humangenerated text. As such, it is reasonable to question whether they approximate …

TEMPERATURE-SCALING SURPRISAL ESTIMATES IMPROVE FIT TO HUMAN READING TIMES–BUT DOES IT DO SO FOR THE “RIGHT REASONS”?

T Liu, I Škrjanec, V Demberg - ICLR 2024 Workshop on …, 2024 - openreview.net
A wide body of evidence shows that human language processing difficulty is predicted by
the information-theoretic measure surprisal, a word's negative log probability in context …