Concrete problems in AI safety D Amodei, C Olah, J Steinhardt, P Christiano, J Schulman, D Mané arXiv preprint arXiv:1606.06565, 2016 | 2643 | 2016 |
Measuring massive multitask language understanding D Hendrycks, C Burns, S Basart, A Zou, M Mazeika, D Song, J Steinhardt arXiv preprint arXiv:2009.03300, 2020 | 1389 | 2020 |
The many faces of robustness: A critical analysis of out-of-distribution generalization D Hendrycks, S Basart, N Mu, S Kadavath, F Wang, E Dorundo, R Desai, ... Proceedings of the IEEE/CVF international conference on computer vision …, 2021 | 1342 | 2021 |
Natural adversarial examples D Hendrycks, K Zhao, S Basart, J Steinhardt, D Song Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021 | 1258 | 2021 |
Certified defenses against adversarial examples A Raghunathan, J Steinhardt, P Liang arXiv preprint arXiv:1801.09344, 2018 | 1067 | 2018 |
The malicious use of artificial intelligence: Forecasting, prevention, and mitigation M Brundage, S Avin, J Clark, H Toner, P Eckersley, B Garfinkel, A Dafoe, ... arXiv preprint arXiv:1802.07228, 2018 | 987 | 2018 |
Certified defenses for data poisoning attacks J Steinhardt, PWW Koh, PS Liang Advances in neural information processing systems 30, 2017 | 837 | 2017 |
Measuring mathematical problem solving with the math dataset D Hendrycks, C Burns, S Kadavath, A Arora, S Basart, E Tang, D Song, ... arXiv preprint arXiv:2103.03874, 2021 | 593 | 2021 |
Semidefinite relaxations for certifying robustness to adversarial examples A Raghunathan, J Steinhardt, PS Liang Advances in neural information processing systems 31, 2018 | 472 | 2018 |
Troubling Trends in Machine Learning Scholarship: Some ML papers suffer from flaws that could mislead the public and stymie future research. ZC Lipton, J Steinhardt Queue 17 (1), 45-77, 2019 | 352 | 2019 |
Scaling out-of-distribution detection for real-world settings D Hendrycks, S Basart, M Mazeika, A Zou, J Kwon, M Mostajabi, ... arXiv preprint arXiv:1911.11132, 2019 | 341 | 2019 |
Measuring coding challenge competence with apps D Hendrycks, S Basart, S Kadavath, M Mazeika, A Arora, E Guo, C Burns, ... arXiv preprint arXiv:2105.09938, 2021 | 330 | 2021 |
Jailbroken: How does llm safety training fail? A Wei, N Haghtalab, J Steinhardt Advances in Neural Information Processing Systems 36, 2024 | 325 | 2024 |
Learning from untrusted data M Charikar, J Steinhardt, G Valiant Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing …, 2017 | 324 | 2017 |
Aligning ai with shared human values D Hendrycks, C Burns, S Basart, A Critch, J Li, D Song, J Steinhardt arXiv preprint arXiv:2008.02275, 2020 | 311 | 2020 |
Sever: A robust meta-algorithm for stochastic optimization I Diakonikolas, G Kamath, D Kane, J Li, J Steinhardt, A Stewart International Conference on Machine Learning, 1596-1606, 2019 | 310 | 2019 |
Sonyc: A system for monitoring, analyzing, and mitigating urban noise pollution JP Bello, C Silva, O Nov, RL Dubois, A Arora, J Salamon, C Mydlarz, ... Communications of the ACM 62 (2), 68-77, 2019 | 309 | 2019 |
Unsolved problems in ml safety D Hendrycks, N Carlini, J Schulman, J Steinhardt arXiv preprint arXiv:2109.13916, 2021 | 259 | 2021 |
Stronger data poisoning attacks break data sanitization defenses PW Koh, J Steinhardt, P Liang Machine Learning, 1-47, 2022 | 244 | 2022 |
Rethinking bias-variance trade-off for generalization of neural networks Z Yang, Y Yu, C You, J Steinhardt, Y Ma International Conference on Machine Learning, 10767-10777, 2020 | 194 | 2020 |