Cognitive Dissonance: Why Do Language Model Outputs Disagree with Internal Representations...

Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments

V Chen, M Yang, W Cui, JS Kim, A Talwalkar, J Ma - Nature methods, 2024 - nature.com

Recent advances in machine learning have enabled the development of next-generation
predictive models for complex computational biology problems, thereby spurring the use of …

被引用次数：8 相关文章所有 4 个版本

[PDF] aclanthology.org

Do Androids Know They're Only Dreaming of Electric Sheep?

CHW Sky, B Van Durme, J Eisner… - Findings of the …, 2024 - aclanthology.org

We design probes trained on the internal representations of a transformer language model
to predict its hallucinatory behavior on three grounded generation tasks. To train the probes …

被引用次数：14 相关文章

[PDF] arxiv.org

Unfamiliar finetuning examples control how language models hallucinate

K Kang, E Wallace, C Tomlin, A Kumar… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models are known to hallucinate when faced with unfamiliar queries, but
the underlying mechanism that govern how models hallucinate are not yet fully understood …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

Eight methods to evaluate robust unlearning in llms

A Lynch, P Guo, A Ewart, S Casper… - arXiv preprint arXiv …, 2024 - arxiv.org

Machine unlearning can be useful for removing harmful capabilities and memorized text
from large language models (LLMs), but there are not yet standardized methods for …

被引用次数：38 相关文章所有 2 个版本

[PDF] arxiv.org

A study on the calibration of in-context learning

H Zhang, YF Zhang, Y Yu, D Madeka, D Foster… - arXiv preprint arXiv …, 2023 - arxiv.org

Accurate uncertainty quantification is crucial for the safe deployment of machine learning
models, and prior research has demonstrated improvements in the calibration of modern …

被引用次数：13 相关文章所有 4 个版本

[PDF] aaai.org

Epistemic injustice in generative ai

J Kay, A Kasirzadeh, S Mohamed - … of the AAAI/ACM Conference on AI …, 2024 - ojs.aaai.org

This paper investigates how generative AI can potentially undermine the integrity of
collective knowledge and the processes we rely on to acquire, assess, and trust information …

被引用次数：3 相关文章所有 4 个版本

[PDF] arxiv.org

The unreasonable effectiveness of easy training data for hard tasks

P Hase, M Bansal, P Clark, S Wiegreffe - arXiv preprint arXiv:2401.06751, 2024 - arxiv.org

How can we train models to perform well on hard test data when hard training data is by
definition difficult to label correctly? This question has been termed the scalable oversight …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Benchmarking mental state representations in language models

M Bortoletto, C Ruhdorfer, L Shi, A Bulling - arXiv preprint arXiv …, 2024 - arxiv.org

While numerous works have assessed the generative performance of language models
(LMs) on tasks requiring Theory of Mind reasoning, research into the models' internal …

被引用次数：4 相关文章所有 9 个版本

[PDF] arxiv.org

Llm internal states reveal hallucination risk faced with a query

Z Ji, D Chen, E Ishii, S Cahyawijaya, Y Bang… - arXiv preprint arXiv …, 2024 - arxiv.org

The hallucination problem of Large Language Models (LLMs) significantly limits their
reliability and trustworthiness. Humans have a self-awareness process that allows us to …

被引用次数：3 相关文章所有 4 个版本

[PDF] aclanthology.org

Insights into llm long-context failures: When transformers know but don't tell

M Gao, TM Lu, K Yu, A Byerly… - Findings of the …, 2024 - aclanthology.org

Abstract Large Language Models (LLMs) exhibit positional bias, struggling to utilize
information from the middle or end of long contexts. Our study explores LLMs' long-context …

被引用次数：2 相关文章

高级搜索

QQ 群