Re-examining calibration: The case of question answering

J Geng, F Cai, Y Wang, H Koeppl, P Nakov… - arXiv preprint arXiv …, 2023 - arxiv.org

Language models (LMs) have demonstrated remarkable capabilities across a wide range of
tasks in various domains. Despite their impressive performance, the reliability of their output …

被引用次数：30 相关文章所有 2 个版本

[PDF] arxiv.org

Can llms express their uncertainty? an empirical evaluation of confidence elicitation in llms

M Xiong, Z Hu, X Lu, Y Li, J Fu, J He, B Hooi - arXiv preprint arXiv …, 2023 - arxiv.org

The task of empowering large language models (LLMs) to accurately express their
confidence, referred to as confidence elicitation, is essential in ensuring reliable and …

被引用次数：244 相关文章所有 3 个版本

[PDF] aclanthology.org

A Survey of Confidence Estimation and Calibration in Large Language Models

J Geng, F Cai, Y Wang, H Koeppl… - Proceedings of the …, 2024 - aclanthology.org

Large language models (LLMs) have demonstrated remarkable capabilities across a wide
range of tasks in various domains. Despite their impressive performance, they can be …

被引用次数：31 相关文章

[PDF] arxiv.org

Conformal prediction with large language models for multi-choice question answering

B Kumar, C Lu, G Gupta, A Palepu, D Bellamy… - arXiv preprint arXiv …, 2023 - arxiv.org

As large language models continue to be widely developed, robust uncertainty
quantification techniques will become crucial for their safe deployment in high-stakes …

被引用次数：52 相关文章所有 3 个版本

[PDF] arxiv.org

Large Language Models Help Humans Verify Truthfulness--Except When They Are Convincingly Wrong

C Si, N Goyal, ST Wu, C Zhao, S Feng… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) are increasingly used for accessing information on the web.
Their truthfulness and factuality are thus of great interest. To help users make the right …

被引用次数：19 相关文章所有 6 个版本

[PDF] mit.edu

Calibrated interpretation: Confidence estimation in semantic parsing

E Stengel-Eskin, B Van Durme - Transactions of the Association for …, 2023 - direct.mit.edu

Sequence generation models are increasingly being used to translate natural language into
programs, ie, to perform executable semantic parsing. The fact that semantic parsing aims to …

被引用次数：19 相关文章所有 9 个版本

[PDF] arxiv.org

Exploring large language models for multi-modal out-of-distribution detection

Y Dai, H Lang, K Zeng, F Huang, Y Li - arXiv preprint arXiv:2310.08027, 2023 - arxiv.org

Out-of-distribution (OOD) detection is essential for reliable and trustworthy machine learning.
Recent multi-modal OOD detection leverages textual information from in-distribution (ID) …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Perceptions of linguistic uncertainty by language models and humans

CG Belem, M Kelly, M Steyvers, S Singh… - arXiv preprint arXiv …, 2024 - arxiv.org

_Uncertainty expressions_ such as" probably" or" highly unlikely" are pervasive in human
language. While prior work has established that there is population-level agreement in terms …

被引用次数：3 相关文章所有 6 个版本

[PDF] aclanthology.org

[PDF][PDF] Predict the Next Word:< Humans Exhibit Uncertainty in this Task and Language Models _>

E Ilia, W Aziz - Proceedings of the 18th Conference of the …, 2024 - aclanthology.org

Abstract Language models (LMs) are statistical models trained to assign probability to
humangenerated text. As such, it is reasonable to question whether they approximate …

被引用次数：3 相关文章

[PDF] openreview.net

TEMPERATURE-SCALING SURPRISAL ESTIMATES IMPROVE FIT TO HUMAN READING TIMES–BUT DOES IT DO SO FOR THE “RIGHT REASONS”?

T Liu, I Škrjanec, V Demberg - ICLR 2024 Workshop on …, 2024 - openreview.net

A wide body of evidence shows that human language processing difficulty is predicted by
the information-theoretic measure surprisal, a word's negative log probability in context …

被引用次数：4 相关文章所有 2 个版本

高级搜索

QQ 群