Jeffrey Ladish 个人学术档案 - 学术资源搜索

引用次数

	总计	2019 年至今
引用	1332	1329
h 指数	7	7
i10 指数	7	7

0

940

470

235

705

2022202320246 382 932

合著作者

Jens MacheProfessor of Computer Science, Lewis & Clark College在 lclark.edu 的电子邮件经过验证

Jeffrey Ladish

Jeffrey Ladish

其他姓名Jeff Ladish

Executive Director, Palisade Research

在 palisaderesearch.org 的电子邮件经过验证


标题按引用次数排序按年份排序按标题排序	引用次数引用次数	年份
Constitutional ai: Harmlessness from ai feedback Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... arXiv preprint arXiv:2212.08073, 2022	1107	2022
Measuring progress on scalable oversight for large language models SR Bowman, J Hyun, E Perez, E Chen, C Pettit, S Heiner, K Lukošiūtė, ... arXiv preprint arXiv:2211.03540, 2022	86	2022
Lora fine-tuning efficiently undoes safety training in llama 2-chat 70b S Lermen, C Rogers-Smith, J Ladish arXiv preprint arXiv:2310.20624, 2023	67	2023
Badllama: cheaply removing safety fine-tuning from llama 2-chat 13b P Gade, S Lermen, C Rogers-Smith, J Ladish arXiv preprint arXiv:2311.00117, 2023	19	2023
Constitutional AI: harmlessness from AI feedback. 2022 Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... arXiv preprint arXiv:2212.08073, 2022	18	2022
Open problems in technical ai governance A Reuel, B Bucknall, S Casper, T Fist, L Soder, O Aarne, L Hammond, ... arXiv preprint arXiv:2407.14981, 2024	11	2024
Constitutional ai: Harmlessness from ai feedback. arXiv 2022 Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... arXiv preprint arXiv:2212.08073, 2023	10	2023
Hands-on cybersecurity exercises for introductory classes: tutorial presentation R Weiss, J Ladish, J Mache, ME Locasto Journal of Computing Sciences in Colleges 32 (1), 173-175, 2016	5	2016
Constitutional AI: Harmlessness from AI Feedback, December 2022 Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ... URL http://arxiv. org/abs/2212.08073 1, 0	5
Information security considerations for AI and the long term future J Ladish, L Heim URL: https://blog. heim. xyz/information-securityconsiderations-for-ai …, 2022	4	2022
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits A Draguns, A Gritsevskiy, SR Motwani, C Rogers-Smith, J Ladish, ... arXiv preprint arXiv:2406.02619, 2024		2024

系统目前无法执行此操作，请稍后再试。

文章 1–11