Francis Rhys Ward 个人学术档案

引用次数

	总计	2019 年至今
引用	72	72
h 指数	5	5
i10 指数	3	3

20212022202320248 11 19 34

开放获取的出版物数量

查看全部

4 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

合著作者

Ibrahim HabliProfessor of Safety-Critical Systems at the University of York在 york.ac.uk 的电子邮件经过验证
Loic Le FolgocAssociate Professor, Télécom Paris, France在 telecom-paris.fr 的电子邮件经过验证
Daniel RueckertTechnical University of Munich and Imperial College London在 tum.de 的电子邮件经过验证
Amir AlansaryResearch Associate, Biomedical Image Analysis Group (BioMedIA), Imperial College London在 imperial.ac.uk 的电子邮件经过验证
Alexey ZakharovUniversity of Oxford - WhiRL在 ic.ac.uk 的电子邮件经过验证

关注

Francis Rhys Ward

Imperial College London

在 ic.ac.uk 的电子邮件经过验证 - 首页

AI alignment deception causality reward learning manipulation


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
An assurance case pattern for the interpretability of machine learning in safety-critical systems FR Ward, I Habli Computer Safety, Reliability, and Security. SAFECOMP 2020 Workshops: DECSoS …, 2020	21	2020
Geometric deep learning for post-menstrual age prediction based on the neonatal white matter cortical surface V Vosylius, A Wang, C Waters, A Zakharov, F Ward, L Le Folgoc, J Cupitt, ... Uncertainty for Safe Utilization of Machine Learning in Medical Imaging, and …, 2020	16	2020
Honesty is the best policy: defining and mitigating AI deception F Ward, F Toni, F Belardinelli, T Everitt Advances in Neural Information Processing Systems 36, 2024	13	2024
On Agent Incentives to Manipulate Human Feedback in Multi-Agent Reward Learning Scenarios. FR Ward, F Toni, F Belardinelli AAMAS, 1759-1761, 2022	6	2022
The reasons that agents act: Intention and instrumental goals FR Ward, M MacDermott, F Belardinelli, F Toni, T Everitt arXiv preprint arXiv:2402.07221, 2024	5	2024
Defining deception in structural causal games FR Ward, F Toni, F Belardinelli Proceedings of the 2023 International Conference on Autonomous Agents and …, 2023	4	2023
Towards defining deception in structural causal games FR Ward NeurIPS ML Safety Workshop, 2022	3	2022
AI Sandbagging: Language Models can Strategically Underperform on Evaluations T van der Weij, F Hofstätter, O Jaffe, SF Brown, FR Ward arXiv preprint arXiv:2406.07358, 2024	1	2024
Argumentative reward learning: Reasoning about human preferences FR Ward, F Belardinelli, F Toni arXiv preprint arXiv:2209.14010, 2022	1	2022
A Causal Perspective on AI Deception in Games. FR Ward, F Toni, F Belardinelli AISafety@ IJCAI, 2022	1	2022
Tall tales at different scales: Evaluating scaling trends for deception in language models FR Ward, F Hofstätter, LA Thomson, HM Wood, O Jaffe, P Bartak, ...	1
Experiments with Detecting and Mitigating AI Deception I Sahbane, FR Ward, CH Åslund arXiv preprint arXiv:2306.14816, 2023		2023
AGI Alignment Coursework Ethics, Privacy, AI in Society FR Ward

系统目前无法执行此操作，请稍后再试。

文章 1–13

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

合著作者

引用