- 学术资源搜索

Larger language models do in-context learning differently

J Wei, J Wei, Y Tay, D Tran, A Webson, Y Lu… - arXiv preprint arXiv …, 2023 - arxiv.org

We study how in-context learning (ICL) in language models is affected by semantic priors
versus input-label mappings. We investigate two setups-ICL with flipped labels and ICL with …

被引用次数：302 相关文章所有 7 个版本

[PDF] arxiv.org

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

B Chen, X Li, Y Liang, Z Shi, Z Song - arXiv preprint arXiv:2410.11268, 2024 - arxiv.org

In-context learning has been recognized as a key factor in the success of Large Language
Models (LLMs). It refers to the model's ability to learn patterns on the fly from provided in …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

Think twice before assure: Confidence estimation for large language models through reflection on multiple answers

M Li, W Wang, F Feng, F Zhu, Q Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Confidence estimation aiming to evaluate output trustability is crucial for the application of
large language models (LLM), especially the black-box ones. Existing confidence estimation …

被引用次数：11 相关文章所有 2 个版本

[PDF] arxiv.org

Calibrating language models with adaptive temperature scaling

J Xie, AS Chen, Y Lee, E Mitchell, C Finn - arXiv preprint arXiv:2409.19817, 2024 - arxiv.org

The effectiveness of large language models (LLMs) is not only measured by their ability to
generate accurate outputs but also by their calibration-how well their confidence scores …

被引用次数：3 相关文章

[PDF] aclanthology.org

Think twice before trusting: Self-detection for large language models through comprehensive answer reflection

M Li, W Wang, F Feng, F Zhu, Q Wang… - Findings of the …, 2024 - aclanthology.org

Abstract Self-detection for Large Language Models (LLMs) seeks to evaluate the
trustworthiness of the LLM's output by leveraging its own capabilities, thereby alleviating the …

被引用次数：2 相关文章

[PDF] arxiv.org

Evaluating language models as risk scores

AF Cruz, M Hardt, C Mendler-Dünner - arXiv preprint arXiv:2407.14614, 2024 - arxiv.org

Current question-answering benchmarks predominantly focus on accuracy in realizable
prediction tasks. Conditioned on a question and answer-key, does the most likely token …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

L Liu, Y Pan, X Li, G Chen - arXiv preprint arXiv:2404.15993, 2024 - arxiv.org

Large language models (LLMs) are highly capable of many tasks but they can sometimes
generate unreliable or inaccurate outputs. To tackle this issue, this paper studies the …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Understanding the Effects of Iterative Prompting on Truthfulness

S Krishna, C Agarwal, H Lakkaraju - arXiv preprint arXiv:2402.06625, 2024 - arxiv.org

The development of Large Language Models (LLMs) has notably transformed numerous
sectors, offering impressive text generation capabilities. Yet, the reliability and truthfulness of …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference

W Cheng, T Wang, Y Ji, F Yang, K Tan… - arXiv preprint arXiv …, 2024 - arxiv.org

While in-context learning with large language models (LLMs) has shown impressive
performance, we have discovered a unique miscalibration behavior where both correct and …

[PDF] aclanthology.org

Calibration-Tuning: Teaching Large Language Models to Know What They Don't Know

S Kapoor, N Gruver, M Roberts, A Pal… - Proceedings of the …, 2024 - aclanthology.org

Large language models are increasingly deployed for high-stakes decision making, for
example in financial and medical applications. In such applications, it is imperative that we …

被引用次数：9 相关文章所有 2 个版本

高级搜索

QQ 群

Larger language models do in-context learning differently

Bypassing the exponential dependency: Looped transformers efficiently learn in-context by multi-step gradient descent

Think twice before assure: Confidence estimation for large language models through reflection on multiple answers

Calibrating language models with adaptive temperature scaling

Think twice before trusting: Self-detection for large language models through comprehensive answer reflection

Evaluating language models as risk scores

Uncertainty Estimation and Quantification for LLMs: A Simple Supervised Approach

Understanding the Effects of Iterative Prompting on Truthfulness

Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference

Calibration-Tuning: Teaching Large Language Models to Know What They Don't Know

引用