Ensembl 2016 A Yates, W Akanni, MR Amode, D Barrell, K Billis, D Carvalho-Silva, ... Nucleic acids research 44 (D1), D710-D716, 2016 | 1640 | 2016 |
Gemini: a family of highly capable multimodal models G Team, R Anil, S Borgeaud, Y Wu, JB Alayrac, J Yu, R Soricut, ... arXiv preprint arXiv:2312.11805, 2023 | 806 | 2023 |
Specification gaming: the flip side of AI ingenuity V Krakovna, J Uesato, V Mikulik, M Rahtz, T Everitt, R Kumar, Z Kenton, ... | 103 | 2020 |
Tracr: Compiled transformers as a laboratory for interpretability D Lindner, J Kramár, S Farquhar, M Rahtz, T McGrath, V Mikulik Advances in Neural Information Processing Systems 36, 2024 | 37 | 2024 |
Does circuit analysis interpretability scale? Evidence from multiple choice capabilities in Chinchilla T Lieberum, M Rahtz, J Kramár, G Irving, R Shah, V Mikulik arXiv preprint arXiv:2307.09458, 2023 | 32 | 2023 |
The hydra effect: Emergent self-repair in language model computations T McGrath, M Rahtz, J Kramar, V Mikulik, S Legg arXiv preprint arXiv:2307.15771, 2023 | 23 | 2023 |
Safe deep RL in 3D environments using human feedback M Rahtz, V Varma, R Kumar, Z Kenton, S Legg, J Leike arXiv preprint arXiv:2201.08102, 2022 | 7 | 2022 |
A mechanism-based approach to mitigating harms from persuasive generative ai S El-Sayed, C Akbulut, A McCroskery, G Keeling, Z Kenton, Z Jalan, ... arXiv preprint arXiv:2404.15058, 2024 | 4 | 2024 |
Evaluating frontier models for dangerous capabilities M Phuong, M Aitchison, E Catt, S Cogan, A Kaskasoli, V Krakovna, ... arXiv preprint arXiv:2403.13793, 2024 | 4 | 2024 |
An extensible interactive interface for agent design M Rahtz, J Fang, AD Dragan, D Hadfield-Menell arXiv preprint arXiv:1906.02641, 2019 | 1 | 2019 |