Toward trustworthy AI development: mechanisms for supporting verifiable claims M Brundage, S Avin, J Wang, H Belfield, G Krueger, G Hadfield, H Khlaaf, ... arXiv preprint arXiv:2004.07213, 2020 | 359 | 2020 |
Sleeper agents: Training deceptive llms that persist through safety training E Hubinger, C Denison, J Mu, M Lambert, M Tong, M MacDiarmid, ... arXiv preprint arXiv:2401.05566, 2024 | 31 | 2024 |
Multiverse: causal reasoning using importance sampling in probabilistic programming Y Perov, L Graham, K Gourgoulias, J Richens, C Lee, A Baker, S Johri Symposium on advances in approximate bayesian inference, 1-36, 2020 | 25 | 2020 |
Inferring work task Automatability from AI expert evidence P Duckworth, L Graham, M Osborne Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, 485-491, 2019 | 17 | 2019 |
Copy, paste, infer: a robust analysis of twin networks for counterfactual inference L Graham, CM Lee, Y Perov NeurIPS19 CausalML workshop, 2019 | 4 | 2019 |
Causal Reasoning and Counterfactual Probabilistic Programming Framework Using Approximate Inference I Perov, LCS Graham, K Gourgoulias, JG Richens, CM Lee, AP Baker, ... US Patent App. 16/944,512, 2021 | 1 | 2021 |
Interpretable causal systems: interpretability and causality in machine learning for human and nonhuman decision-making L Graham University of Oxford, 2020 | | 2020 |