Aligning ai with shared human values D Hendrycks, C Burns, S Basart, A Critch, J Li, D Song, J Steinhardt arXiv preprint arXiv:2008.02275, 2020 | 363 | 2020 |
Emergent complexity and zero-shot transfer via unsupervised environment design M Dennis, N Jaques, E Vinitsky, A Bayen, S Russell, A Critch, S Levine Advances in neural information processing systems 33, 13049-13061, 2020 | 203 | 2020 |
Alignment for advanced machine learning systems J Taylor, E Yudkowsky, P LaVictoire, A Critch Ethics of artificial intelligence, 342-382, 2016 | 128 | 2016 |
Optimal policies tend to seek power AM Turner, L Smith, R Shah, A Critch, P Tadepalli arXiv preprint arXiv:1912.01683, 2019 | 69 | 2019 |
AI research considerations for human existential safety (ARCHES) A Critch, D Krueger arXiv preprint arXiv:2006.04948, 2020 | 54 | 2020 |
A note on the proportionality between some consistency indices in the AHP M Brunelli, A Critch, M Fedrizzi Applied Mathematics and Computation 219 (14), 7901-7906, 2013 | 54 | 2013 |
Logical induction S Garrabrant, T Benson-Tilsen, A Critch, N Soares, J Taylor arXiv preprint arXiv:1609.03543, 2016 | 51 | 2016 |
The magical benchmark for robust imitation S Toyer, R Shah, A Critch, S Russell Advances in Neural Information Processing Systems 33, 18284-18295, 2020 | 47 | 2020 |
Algebraic geometry of matrix product states A Critch, J Morton SIGMA. Symmetry, Integrability and Geometry: Methods and Applications 10, 095, 2014 | 45 | 2014 |
Clusterability in neural networks D Filan, S Casper, S Hod, C Wild, A Critch, S Russell arXiv preprint arXiv:2103.03386, 2021 | 29 | 2021 |
A parametric, resource-bounded generalization of Löb’s theorem, and a robust cooperation criterion for open-source game theory A Critch The Journal of Symbolic Logic 84 (4), 1368-1381, 2019 | 23 | 2019 |
Human irrationality: both bad and good for reward inference L Chan, A Critch, A Dragan arXiv preprint arXiv:2111.06956, 2021 | 22 | 2021 |
TASRA: a taxonomy and analysis of societal-scale risks from AI A Critch, S Russell arXiv preprint arXiv:2306.06924, 2023 | 20 | 2023 |
Pruned neural networks are surprisingly modular D Filan, S Hod, C Wild, A Critch, S Russell arXiv preprint arXiv:2003.04881, 2020 | 19 | 2020 |
Toward negotiable reinforcement learning: shifting priorities in Pareto optimal sequential decision-making A Critch arXiv preprint arXiv:1701.01302, 2017 | 15 | 2017 |
Negotiable reinforcement learning for pareto optimal sequential decision-making N Desai, A Critch, SJ Russell Advances in Neural Information Processing Systems 31, 2018 | 14 | 2018 |
Algebraic geometry of hidden Markov and related models AJ Critch University of California, Berkeley, 2013 | 11 | 2013 |
A formal approach to the problem of logical non-omniscience S Garrabrant, T Benson-Tilsen, A Critch, N Soares, J Taylor arXiv preprint arXiv:1707.08747, 2017 | 10 | 2017 |
Graphical clusterability and local specialization in deep neural networks S Casper, S Hod, D Filan, C Wild, A Critch, S Russell ICLR 2022 Workshop on PAIR {\textasciicircum} 2Struct: Privacy …, 2022 | 9 | 2022 |
Quantifying local specialization in deep neural networks S Hod, D Filan, S Casper, A Critch, S Russell arXiv preprint arXiv:2110.08058, 2021 | 9 | 2021 |