Getting aligned on representational alignment

I Sucholutsky, L Muttenthaler, A Weller, A Peng… - arXiv preprint arXiv …, 2023 - arxiv.org
Biological and artificial information processing systems form representations that they can
use to categorize, reason, plan, navigate, and make decisions. How can we measure the …

Effective human-AI teams via learned natural language rules and onboarding

H Mozannar, J Lee, D Wei, P Sattigeri… - Advances in …, 2024 - proceedings.neurips.cc
People are relying on AI agents to assist them with various tasks. The human must know
when to rely on the agent, collaborate with the agent, or ignore its suggestions. In this work …

The empty signifier problem: Towards clearer paradigms for operationalising" alignment" in large language models

HR Kirk, B Vidgen, P Röttger, SA Hale - arXiv preprint arXiv:2310.02457, 2023 - arxiv.org
In this paper, we address the concept of" alignment" in large language models (LLMs)
through the lens of post-structuralist socio-political theory, specifically examining its parallels …

Contrastive Explanations That Anticipate Human Misconceptions Can Improve Human Decision-Making Skills

Z Buçinca, S Swaroop, AE Paluch… - arXiv preprint arXiv …, 2024 - arxiv.org
People's decision-making abilities often fail to improve or may even erode when they rely on
AI for decision-support, even when the AI provides informative explanations. We argue this …

Modulating language model experiences through frictions

KM Collins, V Chen, I Sucholutsky, HR Kirk… - arXiv preprint arXiv …, 2024 - arxiv.org
Language models are transforming the ways that their users engage with the world. Despite
impressive capabilities, over-consumption of language model outputs risks propagating …

Accuracy-Time Tradeoffs in AI-Assisted Decision Making under Time Pressure

S Swaroop, Z Buçinca, KZ Gajos… - Proceedings of the 29th …, 2024 - dl.acm.org
In settings where users both need high accuracy and are time-pressured, such as doctors
working in emergency rooms, we want to provide AI assistance that both increases decision …

Towards Optimizing Human-Centric Objectives in AI-Assisted Decision-Making With Offline Reinforcement Learning

Z Buçinca, S Swaroop, AE Paluch, SA Murphy… - arXiv preprint arXiv …, 2024 - arxiv.org
As AI assistance is increasingly infused into decision-making processes, we may seek to
optimize human-centric objectives beyond decision accuracy, such as skill improvement or …

Revisiting Rogers' Paradox in the Context of Human-AI Interaction

KM Collins, U Bhatt, I Sucholutsky - arXiv preprint arXiv:2501.10476, 2025 - arxiv.org
Humans learn about the world, and how to act in the world, in many ways: from individually
conducting experiments to observing and reproducing others' behavior. Different learning …

Trustworthy Machine Learning: From Algorithmic Transparency to Decision Support

U Bhatt - 2024 - repository.cam.ac.uk
Developing machine learning models worthy of decision-maker trust is crucial to using
models in practice. Algorithmic transparency tools, such as explainability and uncertainty …

Training Human-AI Teams

H Mozannar - 2024 - dspace.mit.edu
AI systems are augmenting humans' capabilities in settings such as healthcare and
programming, forming human-AI teams. To enable more accurate and timely decisions, we …