Eliciting and learning with soft labels from every annotator

KM Collins, U Bhatt, A Weller - Proceedings of the AAAI conference on …, 2022 - ojs.aaai.org
The labels used to train machine learning (ML) models are of paramount importance.
Typically for ML classification tasks, datasets contain hard labels, yet learning using soft …

Human uncertainty in concept-based ai systems

KM Collins, M Barker, M Espinosa Zarlenga… - Proceedings of the …, 2023 - dl.acm.org
Placing a human in the loop may help abate the risks of deploying AI systems in safety-
critical settings (eg, a clinician working with a medical AI system). However, mitigating risks …

Distributional preference learning: Understanding and accounting for hidden context in RLHF

A Siththaranjan, C Laidlaw… - arXiv preprint arXiv …, 2023 - arxiv.org
In practice, preference learning from human feedback depends on incomplete data with
hidden context. Hidden context refers to data that affects the feedback received, but which is …

Personalizing reinforcement learning from human feedback with variational preference learning

S Poddar, Y Wan, H Ivison, A Gupta… - arXiv preprint arXiv …, 2024 - arxiv.org
Reinforcement Learning from Human Feedback (RLHF) is a powerful paradigm for aligning
foundation models to human values and preferences. However, current RLHF techniques …

Learning personalized decision support policies

U Bhatt, V Chen, KM Collins, P Kamalaruban… - arXiv preprint arXiv …, 2023 - arxiv.org
Individual human decision-makers may benefit from different forms of support to improve
decision outcomes. However, a key question is which form of support will lead to accurate …

Examining Responsibility and Deliberation in AI Impact Statements and Ethics Reviews

D Liu, P Nanayakkara, SA Sakha… - Proceedings of the …, 2022 - dl.acm.org
The artificial intelligence research community is continuing to grapple with the ethics of its
work by encouraging researchers to discuss potential positive and negative consequences …

Understanding hidden context in preference learning: Consequences for rlhf

A Siththaranjan, C Laidlaw… - Socially Responsible …, 2023 - openreview.net
In practice, preference learning from human feedback depends on incomplete data with
hidden context. Hidden context refers to data that affects the feedback received, but which is …

Learning noise-induced reward functions for surpassing demonstrations in imitation learning

L Huo, Z Wang, M Xu - Proceedings of the AAAI Conference on Artificial …, 2023 - ojs.aaai.org
Imitation learning (IL) has recently shown impressive performance in training a
reinforcement learning agent with human demonstrations, eliminating the difficulty of …

Mapping out the Space of Human Feedback for Reinforcement Learning: A Conceptual Framework

Y Metz, D Lindner, R Baur, M El-Assady - arXiv preprint arXiv:2411.11761, 2024 - arxiv.org
Reinforcement Learning from Human feedback (RLHF) has become a powerful tool to fine-
tune or train agentic machine learning models. Similar to how humans interact in social …

Trustworthy Machine Learning: From Algorithmic Transparency to Decision Support

U Bhatt - 2024 - repository.cam.ac.uk
Developing machine learning models worthy of decision-maker trust is crucial to using
models in practice. Algorithmic transparency tools, such as explainability and uncertainty …