关注
Matthew Rahtz
Matthew Rahtz
Google DeepMind
在 google.com 的电子邮件经过验证 - 首页
标题
引用次数
引用次数
年份
Ensembl 2016
A Yates, W Akanni, MR Amode, D Barrell, K Billis, D Carvalho-Silva, ...
Nucleic acids research 44 (D1), D710-D716, 2016
16402016
Gemini: a family of highly capable multimodal models
G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ...
arXiv preprint arXiv:2312.11805, 2023
8062023
Specification gaming: the flip side of AI ingenuity
V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ...
1032020
Tracr: Compiled transformers as a laboratory for interpretability
D Lindner, J Kramár, S Farquhar, M Rahtz, T McGrath, V Mikulik
Advances in Neural Information Processing Systems 36, 2024
372024
Does circuit analysis interpretability scale? Evidence from multiple choice capabilities in Chinchilla
T Lieberum, M Rahtz, J Kramár, G Irving, R Shah, V Mikulik
arXiv preprint arXiv:2307.09458, 2023
322023
The hydra effect: Emergent self-repair in language model computations
T McGrath, M Rahtz, J Kramar, V Mikulik, S Legg
arXiv preprint arXiv:2307.15771, 2023
232023
Safe deep RL in 3D environments using human feedback
M Rahtz, V Varma, R Kumar, Z Kenton, S Legg, J Leike
arXiv preprint arXiv:2201.08102, 2022
72022
A mechanism-based approach to mitigating harms from persuasive generative ai
S El-Sayed, C Akbulut, A McCroskery, G Keeling, Z Kenton, Z Jalan, ...
arXiv preprint arXiv:2404.15058, 2024
42024
Evaluating frontier models for dangerous capabilities
M Phuong, M Aitchison, E Catt, S Cogan, A Kaskasoli, V Krakovna, ...
arXiv preprint arXiv:2403.13793, 2024
42024
An extensible interactive interface for agent design
M Rahtz, J Fang, AD Dragan, D Hadfield-Menell
arXiv preprint arXiv:1906.02641, 2019
12019
系统目前无法执行此操作,请稍后再试。
文章 1–10