Lawrence Chan 个人学术档案

引用次数

	总计	2019 年至今
引用	621	615
h 指数	10	10
i10 指数	11	11

300

150

225

2019202020212022202320242 9 13 37 282 269

开放获取的出版物数量

查看全部

2 篇文章

0 篇文章

可查看的文章

无法查看的文章

根据资助方的强制性开放获取政策

合著作者

Neel NandaResearch Engineer, Google DeepMind在 deepmind.com 的电子邮件经过验证
Anca D DraganAssistant Professor at UC Berkeley // Director, AI Safety and Alignment, Google DeepMind在 berkeley.edu 的电子邮件经过验证
Jacob SteinhardtStanford University在 cs.stanford.edu 的电子邮件经过验证
Richard NgoOpenAI在 openai.com 的电子邮件经过验证
Dylan Hadfield-MenellMassachusetts Institute of Technology在 csail.mit.edu 的电子邮件经过验证
Siddhartha SrinivasaProfessor, University of Washington在 cs.washington.edu 的电子邮件经过验证
Andrew CritchUC Berkeley, Department of Electrical Engineering and Computer Sciences在 eecs.berkeley.edu 的电子邮件经过验证
Adam Scherlis在 scherlis.com 的电子邮件经过验证
Daniel M. ZieglerRedwood Research在 rdwrs.com 的电子邮件经过验证
Noa NabeshimaUC Santa Barbara在 ucsb.edu 的电子邮件经过验证
Ben Weinstein-RaunPalisade Research, AI Impacts在 benwr.net 的电子邮件经过验证
Tim BaumanSurge AI在 surgehq.ai 的电子邮件经过验证
Rachel FreedmanUC Berkeley在 berkeley.edu 的电子邮件经过验证
Rohin ShahResearch Scientist, Google DeepMind在 deepmind.com 的电子邮件经过验证
Stuart RussellProfessor of Computer Science, University of California, Berkeley在 cs.berkeley.edu 的电子邮件经过验证
Michael DennisGoogle DeepMind在 cs.berkeley.edu 的电子邮件经过验证
Dmitrii KrasheninnikovUniversity of Cambridge在 cam.ac.uk 的电子邮件经过验证
Daniel S. BrownAssistant Professor, Robotics Center and School of Computing, University of Utah在 cs.utah.edu 的电子邮件经过验证
Euan McLeanFAR AI在 far.ai 的电子邮件经过验证
Pedro FreireUK Office of Communications在 aston.ac.uk 的电子邮件经过验证

关注

Lawrence Chan

PhD Student, UC Berkeley

在 berkeley.edu 的电子邮件经过验证

AI Alignment Interpretability Reward Learning


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Progress measures for grokking via mechanistic interpretability N Nanda, L Chan, T Liberum, J Smith, J Steinhardt ICLR 2023, 2023	208	2023
The alignment problem from a deep learning perspective R Ngo, L Chan, S Mindermann arXiv preprint arXiv:2209.00626, 2022	120	2022
A toy model of universality: Reverse engineering how networks learn group operations B Chughtai, L Chan, N Nanda ICML 2023, 2023	54	2023
The assistive multi-armed bandit L Chan, D Hadfield-Menell, S Srinivasa, A Dragan 2019 14th ACM/IEEE International Conference on Human-Robot Interaction (HRI …, 2019	48	2019
Causal Scrubbing: a method for rigorously testing interpretability hypotheses L Chan, A Garriga-Alonso, N Goldowsky-Dill, R Greenblatt, ... https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a …, 2022	39	2022
Adversarial Training for High-Stakes Reliability DM Ziegler, S Nix, L Chan, T Bauman, P Schmidt-Nielsen, T Lin, ... NeurIPS 2022, 2022	37	2022
Benefits of assistance over reward learning R Shah, P Freire, N Alex, R Freedman, D Krasheninnikov, L Chan, ...	28	2020
Evaluating Language-Model Agents on Realistic Autonomous Tasks M Kinniment, LJK Sato, H Du, B Goodrich, M Hasin, L Chan, LH Miles, ... https://evals.alignment.org/Evaluating_LMAs_Realistic_Tasks.pdf, 2023	19	2023
Human irrationality: both bad and good for reward inference L Chan, A Critch, A Dragan arXiv preprint arXiv:2111.06956, 2021	18	2021
Optimal cost design for model predictive control A Jain, L Chan, DS Brown, AD Dragan Learning for Dynamics and Control, 1205-1217, 2021	17	2021
The alignment problem from a deep learning perspective. arXiv R Ngo, L Chan, S Mindermann URL: http://arxiv. org/abs/2209.00626, 2023	10	2023
Progress measures for grokking via mechanistic interpretability, January 2023 N Nanda, L Chan, T Lieberum, J Smith, J Steinhardt arXiv preprint arXiv:2301.05217, 0	7
Language models are better than humans at next-token prediction B Shlegeris, F Roger, L Chan, E McLean arXiv preprint arXiv:2212.11281, 2022	4	2022
A study on autonomous hole machining process analysis by reverse engineering of NC programs X Yan, L Chan, K Yamazaki, J Liu, M Kubota, Y Amano SAE transactions, 1045-1051, 1999	4	1999
The alignment problem from a deep learning perspective: A position paper R Ngo, L Chan, S Mindermann The Twelfth International Conference on Learning Representations, 2024	3	2024
Neural networks learn representation theory: Reverse engineering how networks perform group operations B Chughtai, L Chan, N Nanda ICLR 2023 Workshop on Physics for Machine Learning, 2023	3	2023
Autonomous machining process analyzer LC Chan University of California, Davis, 1998	1	1998
The impacts of known and unknown demonstrator irrationality on reward inference L Chan, A Critch, A Dragan	1
Provable Guarantees for Model Performance via Mechanistic Interpretability J Gross, R Agrawal, T Kwa, E Ong, CH Yip, A Gibson, S Noubir, L Chan arXiv preprint arXiv:2406.11779, 2024		2024
Accounting for Human Learning when Inferring Human Preferences H Giles, L Chan arXiv preprint arXiv:2011.05596, 2020		2020

系统目前无法执行此操作，请稍后再试。

文章 1–20

每年引用数

重复的引用

合并的引用

添加合著者合著作者

上传 PDF

关注此作者

引用次数

合著作者

引用