Progress measures for grokking via mechanistic interpretability N Nanda, L Chan, T Liberum, J Smith, J Steinhardt ICLR 2023, 2023 | 208 | 2023 |
The alignment problem from a deep learning perspective R Ngo, L Chan, S Mindermann arXiv preprint arXiv:2209.00626, 2022 | 120 | 2022 |
A toy model of universality: Reverse engineering how networks learn group operations B Chughtai, L Chan, N Nanda ICML 2023, 2023 | 54 | 2023 |
The assistive multi-armed bandit L Chan, D Hadfield-Menell, S Srinivasa, A Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019 | 48 | 2019 |
Causal Scrubbing: a method for rigorously testing interpretability hypotheses L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ... https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a …, 2022 | 39 | 2022 |
Adversarial Training for High-Stakes Reliability DM Ziegler, S Nix, L Chan, T Bauman, P Schmidt-Nielsen, T Lin, ... NeurIPS 2022, 2022 | 37 | 2022 |
Benefits of assistance over reward learning R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ... | 28 | 2020 |
Evaluating Language-Model Agents on Realistic Autonomous Tasks M Kinniment, LJK Sato, H Du, B Goodrich, M Hasin, L Chan, LH Miles, ... https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf, 2023 | 19 | 2023 |
Human irrationality: both bad and good for reward inference L Chan, A Critch, A Dragan arXiv preprint arXiv:2111.06956, 2021 | 18 | 2021 |
Optimal cost design for model predictive control A Jain, L Chan, DS Brown, AD Dragan Learning for Dynamics and Control, 1205-1217, 2021 | 17 | 2021 |
The alignment problem from a deep learning perspective. arXiv R Ngo, L Chan, S Mindermann URL: http://arxiv. org/abs/2209.00626, 2023 | 10 | 2023 |
Progress measures for grokking via mechanistic interpretability, January 2023 N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt arXiv preprint arXiv:2301.05217, 0 | 7 | |
Language models are better than humans at next-token prediction B Shlegeris, F Roger, L Chan, E McLean arXiv preprint arXiv:2212.11281, 2022 | 4 | 2022 |
A study on autonomous hole machining process analysis by reverse engineering of NC programs X Yan, L Chan, K Yamazaki, J Liu, M Kubota, Y Amano SAE transactions, 1045-1051, 1999 | 4 | 1999 |
The alignment problem from a deep learning perspective: A position paper R Ngo, L Chan, S Mindermann The Twelfth International Conference on Learning Representations, 2024 | 3 | 2024 |
Neural networks learn representation theory: Reverse engineering how networks perform group operations B Chughtai, L Chan, N Nanda ICLR 2023 Workshop on Physics for Machine Learning, 2023 | 3 | 2023 |
Autonomous machining process analyzer LC Chan University of California, Davis, 1998 | 1 | 1998 |
The impacts of known and unknown demonstrator irrationality on reward inference L Chan, A Critch, A Dragan | 1 | |
Provable Guarantees for Model Performance via Mechanistic Interpretability J Gross, R Agrawal, T Kwa, E Ong, CH Yip, A Gibson, S Noubir, L Chan arXiv preprint arXiv:2406.11779, 2024 | | 2024 |
Accounting for Human Learning when Inferring Human Preferences H Giles, L Chan arXiv preprint arXiv:2011.05596, 2020 | | 2020 |