An assurance case pattern for the interpretability of machine learning in safety-critical systems FR Ward, I Habli Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops: DECSoS …, 2020 | 21 | 2020 |
Geometric deep learning for post-menstrual age prediction based on the neonatal white matter cortical surface V Vosylius, A Wang, C Waters, A Zakharov, F Ward, L Le Folgoc, J Cupitt, ... Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and …, 2020 | 16 | 2020 |
Honesty is the best policy: defining and mitigating AI deception F Ward, F Toni, F Belardinelli, T Everitt Advances in Neural Information Processing Systems 36, 2024 | 13 | 2024 |
On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios. FR Ward, F Toni, F Belardinelli AAMAS, 1759-1761, 2022 | 6 | 2022 |
The reasons that agents act: Intention and instrumental goals FR Ward, M MacDermott, F Belardinelli, F Toni, T Everitt arXiv preprint arXiv:2402.07221, 2024 | 5 | 2024 |
Defining deception in structural causal games FR Ward, F Toni, F Belardinelli Proceedings of the 2023 International Conference on Autonomous Agents and …, 2023 | 4 | 2023 |
Towards defining deception in structural causal games FR Ward NeurIPS ML Safety Workshop, 2022 | 3 | 2022 |
AI Sandbagging: Language Models can Strategically Underperform on Evaluations T van der Weij, F Hofstätter, O Jaffe, SF Brown, FR Ward arXiv preprint arXiv:2406.07358, 2024 | 1 | 2024 |
Argumentative reward learning: Reasoning about human preferences FR Ward, F Belardinelli, F Toni arXiv preprint arXiv:2209.14010, 2022 | 1 | 2022 |
A Causal Perspective on AI Deception in Games. FR Ward, F Toni, F Belardinelli AISafety@ IJCAI, 2022 | 1 | 2022 |
Tall tales at different scales: Evaluating scaling trends for deception in language models FR Ward, F Hofstätter, LA Thomson, HM Wood, O Jaffe, P Bartak, ... | 1 | |
Experiments with Detecting and Mitigating AI Deception I Sahbane, FR Ward, CH Åslund arXiv preprint arXiv:2306.14816, 2023 | | 2023 |
AGI Alignment Coursework Ethics, Privacy, AI in Society FR Ward | | |