Agent Incentives: A Causal Perspective T Everitt, R Carey, E Langlois, PA Ortega, S Legg AAAI, 2021 | 43 | 2021 |
Incorrigibility in the CIRL Framework R Carey Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, 30-35, 2018 | 25* | 2018 |
Path-Specific Objectives for Safer Agent Incentives S Farquhar, R Carey, T Everitt AAAI, 2022 | 20 | 2022 |
Why Fair Labels Can Yield Unfair Predictions: Graphical Conditions for Introduced Unfairness C Ashurst, R Carey, S Chiappa, T Everitt AAAI, 2022 | 16 | 2022 |
Predicting human deliberative judgments with machine learning O Evans, A Stuhlmüller, C Cundy, R Carey, Z Kenton, T McGrath, ... Technical report, University of Oxford, 2018 | 15 | 2018 |
The Incentives that Shape Behaviour R Carey, E Langlois, T Everitt, S Legg Safe AI AAAI Workshop, 2020 | 14 | 2020 |
Reasoning about Causality in Games L Hammond, J Fox, T Everitt, R Carey, A Abate, M Wooldridge AI Journal, 2023 | 12 | 2023 |
Interpreting AI Compute Trends R Carey AI Impacts Blog, 2018 | 10 | 2018 |
Human Control: Definitions and Algorithms R Carey, T Everitt UAI, 2023 | 9 | 2023 |
The Effective Altruism Handbook R Carey The Centre for Effective Altruism, 2015 | 8 | 2015 |
PyCID: A Python Library for Causal Influence Diagrams J Fox, T Everitt, R Carey, E Langlois, A Abate, M Wooldridge SciPy, 2021 | 6 | 2021 |
A Complete Criterion for Value of Information in Soluble Influence Diagrams C van Merwijk, R Carey, T Everitt AAAI, 2022 | 3 | 2022 |
Reasoning about Causality in Games (Abstract Reprint) L Hammond, J Fox, T Everitt, R Carey, A Abate, M Wooldridge Proceedings of the AAAI Conference on Artificial Intelligence 38 (20), 22697 …, 2024 | | 2024 |
(When) Is Truth-telling Favored in AI Debate? V Kovařík, R Carey SafeAI AAAI Workshop, 2019 | | 2019 |
How useful is quantilization for mitigating specification-gaming? R Carey SafeML Workshop at International Conference on Learning Representations, 2019 | | 2019 |