关注
Lawrence Chan
Lawrence Chan
PhD Student, UC Berkeley
在 berkeley.edu 的电子邮件经过验证
标题
引用次数
引用次数
年份
Progress measures for grokking via mechanistic interpretability
N Nanda, L Chan, T Liberum, J Smith, J Steinhardt
ICLR 2023, 2023
2082023
The alignment problem from a deep learning perspective
R Ngo, L Chan, S Mindermann
arXiv preprint arXiv:2209.00626, 2022
1202022
A toy model of universality: Reverse engineering how networks learn group operations
B Chughtai, L Chan, N Nanda
ICML 2023, 2023
542023
The assistive multi-armed bandit
L Chan, D Hadfield-Menell, S Srinivasa, A Dragan
2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019
482019
Causal Scrubbing: a method for rigorously testing interpretability hypotheses
L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ...
https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a …, 2022
392022
Adversarial Training for High-Stakes Reliability
DM Ziegler, S Nix, L Chan, T Bauman, P Schmidt-Nielsen, T Lin, ...
NeurIPS 2022, 2022
372022
Benefits of assistance over reward learning
R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ...
282020
Evaluating Language-Model Agents on Realistic Autonomous Tasks
M Kinniment, LJK Sato, H Du, B Goodrich, M Hasin, L Chan, LH Miles, ...
https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf, 2023
192023
Human irrationality: both bad and good for reward inference
L Chan, A Critch, A Dragan
arXiv preprint arXiv:2111.06956, 2021
182021
Optimal cost design for model predictive control
A Jain, L Chan, DS Brown, AD Dragan
Learning for Dynamics and Control, 1205-1217, 2021
172021
The alignment problem from a deep learning perspective. arXiv
R Ngo, L Chan, S Mindermann
URL: http://arxiv. org/abs/2209.00626, 2023
102023
Progress measures for grokking via mechanistic interpretability, January 2023
N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt
arXiv preprint arXiv:2301.05217, 0
7
Language models are better than humans at next-token prediction
B Shlegeris, F Roger, L Chan, E McLean
arXiv preprint arXiv:2212.11281, 2022
42022
A study on autonomous hole machining process analysis by reverse engineering of NC programs
X Yan, L Chan, K Yamazaki, J Liu, M Kubota, Y Amano
SAE transactions, 1045-1051, 1999
41999
The alignment problem from a deep learning perspective: A position paper
R Ngo, L Chan, S Mindermann
The Twelfth International Conference on Learning Representations, 2024
32024
Neural networks learn representation theory: Reverse engineering how networks perform group operations
B Chughtai, L Chan, N Nanda
ICLR 2023 Workshop on Physics for Machine Learning, 2023
32023
Autonomous machining process analyzer
LC Chan
University of California, Davis, 1998
11998
The impacts of known and unknown demonstrator irrationality on reward inference
L Chan, A Critch, A Dragan
1
Provable Guarantees for Model Performance via Mechanistic Interpretability
J Gross, R Agrawal, T Kwa, E Ong, CH Yip, A Gibson, S Noubir, L Chan
arXiv preprint arXiv:2406.11779, 2024
2024
Accounting for Human Learning when Inferring Human Preferences
H Giles, L Chan
arXiv preprint arXiv:2011.05596, 2020
2020
系统目前无法执行此操作,请稍后再试。
文章 1–20