Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org
We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

B Chen, X Li, Y Liang, Z Shi, Z Song - arXiv preprint arXiv:2410.11268, 2024 - arxiv.org
In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …

Think twice before assure: Confidence estimation for large language models through reflection on multiple answers

M Li, W Wang, F Feng, F Zhu, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Confidence estimation aiming to evaluate output trustability is crucial for the application of
large language models (LLM), especially the black-box ones. Existing confidence estimation …

Calibrating language models with adaptive temperature scaling

J Xie, AS Chen, Y Lee, E Mitchell, C Finn - arXiv preprint arXiv:2409.19817, 2024 - arxiv.org
The effectiveness of large language models (LLMs) is not only measured by their ability to
generate accurate outputs but also by their calibration-how well their confidence scores …

Think twice before trusting: Self-detection for large language models through comprehensive answer reflection

M Li, W Wang, F Feng, F Zhu, Q Wang… - Findings of the …, 2024 - aclanthology.org
Abstract Self-detection for Large Language Models (LLMs) seeks to evaluate the
trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the …

Evaluating language models as risk scores

AF Cruz, M Hardt, C Mendler-Dünner - arXiv preprint arXiv:2407.14614, 2024 - arxiv.org
Current question-answering benchmarks predominantly focus on accuracy in realizable
prediction tasks. Conditioned on a question and answer-key, does the most likely token …

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

L Liu, Y Pan, X Li, G Chen - arXiv preprint arXiv:2404.15993, 2024 - arxiv.org
Large language models (LLMs) are highly capable of many tasks but they can sometimes
generate unreliable or inaccurate outputs. To tackle this issue, this paper studies the …

Understanding the Effects of Iterative Prompting on Truthfulness

S Krishna, C Agarwal, H Lakkaraju - arXiv preprint arXiv:2402.06625, 2024 - arxiv.org
The development of Large Language Models (LLMs) has notably transformed numerous
sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of …

Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference

W Cheng, T Wang, Y Ji, F Yang, K Tan… - arXiv preprint arXiv …, 2024 - arxiv.org
While in-context learning with large language models (LLMs) has shown impressive
performance, we have discovered a unique miscalibration behavior where both correct and …

Calibration-Tuning: Teaching Large Language Models to Know What They Don't Know

S Kapoor, N Gruver, M Roberts, A Pal… - Proceedings of the …, 2024 - aclanthology.org
Large language models are increasingly deployed for high-stakes decision making, for
example in financial and medical applications. In such applications, it is imperative that we …