关注
Daniel Paleka
Daniel Paleka
在 inf.ethz.ch 的电子邮件经过验证
标题
引用次数
引用次数
年份
Poisoning Web-Scale Training Datasets is Practical
N Carlini, M Jagielski, CA Choquette-Choo, D Paleka, W Pearce, ...
arXiv preprint arXiv:2302.10149, 2023
1022023
Red-Teaming the Stable Diffusion Safety Filter
J Rando, D Paleka, D Lindner, L Heim, F Tramèr
arXiv preprint arXiv:2210.04610, 2022
842022
ARB: Advanced Reasoning Benchmark for Large Language Models
T Sawada, D Paleka, A Havrilla, P Tadepalli, P Vidas, A Kranias, JJ Nay, ...
arXiv preprint arXiv:2307.13692, 2023
342023
Foundational Challenges in Assuring Alignment and Safety of Large Language Models
U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ...
arXiv preprint arXiv:2404.09932, 2024
262024
Evaluating Superhuman Models with Consistency Checks
L Fluri*, D Paleka*, F Tramèr
arXiv preprint arXiv:2306.09983, 2023
162023
Stealing Part of a Production Language Model
N Carlini, D Paleka, KD Dvijotham, T Steinke, J Hayase, AF Cooper, ...
arXiv preprint arXiv:2403.06634, 2024
152024
A law of adversarial risk, interpolation, and label noise
D Paleka, A Sanyal
arXiv preprint arXiv:2207.03933, 2022
82022
Injectivity of ReLU neural networks at initialization
D Paleka
ETH Zurich, 2021
12021
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition
E Debenedetti*, J Rando*, D Paleka*, FF Silaghi, D Albastroiu, N Cohen, ...
arXiv e-prints, arXiv: 2406.07954, 2024
2024
系统目前无法执行此操作,请稍后再试。
文章 1–9