关注
Jeffrey Ladish
Jeffrey Ladish
其他姓名Jeff Ladish
Executive Director, Palisade Research
在 palisaderesearch.org 的电子邮件经过验证
标题
引用次数
引用次数
年份
Constitutional ai: Harmlessness from ai feedback
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
11072022
Measuring progress on scalable oversight for large language models
SR Bowman, J Hyun, E Perez, E Chen, C Pettit, S Heiner, K Lukošiūtė, ...
arXiv preprint arXiv:2211.03540, 2022
862022
Lora fine-tuning efficiently undoes safety training in llama 2-chat 70b
S Lermen, C Rogers-Smith, J Ladish
arXiv preprint arXiv:2310.20624, 2023
672023
Badllama: cheaply removing safety fine-tuning from llama 2-chat 13b
P Gade, S Lermen, C Rogers-Smith, J Ladish
arXiv preprint arXiv:2311.00117, 2023
192023
Constitutional AI: harmlessness from AI feedback. 2022
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2022
182022
Open problems in technical ai governance
A Reuel, B Bucknall, S Casper, T Fist, L Soder, O Aarne, L Hammond, ...
arXiv preprint arXiv:2407.14981, 2024
112024
Constitutional ai: Harmlessness from ai feedback. arXiv 2022
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
arXiv preprint arXiv:2212.08073, 2023
102023
Hands-on cybersecurity exercises for introductory classes: tutorial presentation
R Weiss, J Ladish, J Mache, ME Locasto
Journal of Computing Sciences in Colleges 32 (1), 173-175, 2016
52016
Constitutional AI: Harmlessness from AI Feedback, December 2022
Y Bai, S Kadavath, S Kundu, A Askell, J Kernion, A Jones, A Chen, ...
URL http://arxiv. org/abs/2212.08073 1, 0
5
Information security considerations for AI and the long term future
J Ladish, L Heim
URL: https://blog. heim. xyz/information-securityconsiderations-for-ai …, 2022
42022
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
A Draguns, A Gritsevskiy, SR Motwani, C Rogers-Smith, J Ladish, ...
arXiv preprint arXiv:2406.02619, 2024
2024
系统目前无法执行此操作,请稍后再试。
文章 1–11