Applying interpretable machine learning in computational biology—pitfalls, recommendations and opportunities for new developments

V Chen, M Yang, W Cui, JS Kim, A Talwalkar, J Ma - Nature methods, 2024 - nature.com
Recent advances in machine learning have enabled the development of next-generation
predictive models for complex computational biology problems, thereby spurring the use of …

Do Androids Know They're Only Dreaming of Electric Sheep?

CHW Sky, B Van Durme, J Eisner… - Findings of the …, 2024 - aclanthology.org
We design probes trained on the internal representations of a transformer language model
to predict its hallucinatory behavior on three grounded generation tasks. To train the probes …

Unfamiliar finetuning examples control how language models hallucinate

K Kang, E Wallace, C Tomlin, A Kumar… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models are known to hallucinate when faced with unfamiliar queries, but
the underlying mechanism that govern how models hallucinate are not yet fully understood …

Eight methods to evaluate robust unlearning in llms

A Lynch, P Guo, A Ewart, S Casper… - arXiv preprint arXiv …, 2024 - arxiv.org
Machine unlearning can be useful for removing harmful capabilities and memorized text
from large language models (LLMs), but there are not yet standardized methods for …

A study on the calibration of in-context learning

H Zhang, YF Zhang, Y Yu, D Madeka, D Foster… - arXiv preprint arXiv …, 2023 - arxiv.org
Accurate uncertainty quantification is crucial for the safe deployment of machine learning
models, and prior research has demonstrated improvements in the calibration of modern …

Epistemic injustice in generative ai

J Kay, A Kasirzadeh, S Mohamed - … of the AAAI/ACM Conference on AI …, 2024 - ojs.aaai.org
This paper investigates how generative AI can potentially undermine the integrity of
collective knowledge and the processes we rely on to acquire, assess, and trust information …

The unreasonable effectiveness of easy training data for hard tasks

P Hase, M Bansal, P Clark, S Wiegreffe - arXiv preprint arXiv:2401.06751, 2024 - arxiv.org
How can we train models to perform well on hard test data when hard training data is by
definition difficult to label correctly? This question has been termed the scalable oversight …

Benchmarking mental state representations in language models

M Bortoletto, C Ruhdorfer, L Shi, A Bulling - arXiv preprint arXiv …, 2024 - arxiv.org
While numerous works have assessed the generative performance of language models
(LMs) on tasks requiring Theory of Mind reasoning, research into the models' internal …

Llm internal states reveal hallucination risk faced with a query

Z Ji, D Chen, E Ishii, S Cahyawijaya, Y Bang… - arXiv preprint arXiv …, 2024 - arxiv.org
The hallucination problem of Large Language Models (LLMs) significantly limits their
reliability and trustworthiness. Humans have a self-awareness process that allows us to …

Insights into llm long-context failures: When transformers know but don't tell

M Gao, TM Lu, K Yu, A Byerly… - Findings of the …, 2024 - aclanthology.org
Abstract Large Language Models (LLMs) exhibit positional bias, struggling to utilize
information from the middle or end of long contexts. Our study explores LLMs' long-context …