Poisoning Web-Scale Training Datasets is Practical N Carlini, M Jagielski, CA Choquette-Choo, D Paleka, W Pearce, ... arXiv preprint arXiv:2302.10149, 2023 | 102 | 2023 |
Red-Teaming the Stable Diffusion Safety Filter J Rando, D Paleka, D Lindner, L Heim, F Tramèr arXiv preprint arXiv:2210.04610, 2022 | 84 | 2022 |
ARB: Advanced Reasoning Benchmark for Large Language Models T Sawada, D Paleka, A Havrilla, P Tadepalli, P Vidas, A Kranias, JJ Nay, ... arXiv preprint arXiv:2307.13692, 2023 | 34 | 2023 |
Foundational Challenges in Assuring Alignment and Safety of Large Language Models U Anwar, A Saparov, J Rando, D Paleka, M Turpin, P Hase, ES Lubana, ... arXiv preprint arXiv:2404.09932, 2024 | 26 | 2024 |
Evaluating Superhuman Models with Consistency Checks L Fluri*, D Paleka*, F Tramèr arXiv preprint arXiv:2306.09983, 2023 | 16 | 2023 |
Stealing Part of a Production Language Model N Carlini, D Paleka, KD Dvijotham, T Steinke, J Hayase, AF Cooper, ... arXiv preprint arXiv:2403.06634, 2024 | 15 | 2024 |
A law of adversarial risk, interpolation, and label noise D Paleka, A Sanyal arXiv preprint arXiv:2207.03933, 2022 | 8 | 2022 |
Injectivity of ReLU neural networks at initialization D Paleka ETH Zurich, 2021 | 1 | 2021 |
Dataset and Lessons Learned from the 2024 SaTML LLM Capture-the-Flag Competition E Debenedetti*, J Rando*, D Paleka*, FF Silaghi, D Albastroiu, N Cohen, ... arXiv e-prints, arXiv: 2406.07954, 2024 | | 2024 |