On the Reliability of Watermarks for Large Language Models J Kirchenbauer, J Geiping, Y Wen, M Shu, K Saifullah, K Kong, ... arXiv preprint arXiv:2306.04634, 2023 | 94 | 2023 |
Dall· e mini B Dayma, S Patil, P Cuenca, K Saifullah, T Abraham, P Le Khac, L Melas, ... July, 2021 | 86* | 2021 |
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models N Jain, K Saifullah, Y Wen, J Kirchenbauer, M Shu, A Saha, M Goldblum, ... arXiv preprint arXiv:2306.13651, 2023 | 16 | 2023 |
Coercing LLMs to do and reveal (almost) anything J Geiping, A Stein, M Shu, K Saifullah, Y Wen, T Goldstein arXiv preprint arXiv:2402.14020, 2024 | 12 | 2024 |
CinePile: A Long Video Question Answering Dataset and Benchmark R Rawal, K Saifullah, R Basri, D Jacobs, G Somepalli, T Goldstein arXiv preprint arXiv:2405.08813, 2024 | 5 | 2024 |
Seeing in Words: Learning to Classify through Language Bottlenecks K Saifullah, Y Wen, J Geiping, M Goldblum, T Goldstein arXiv preprint arXiv:2307.00028, 2023 | 1 | 2023 |
Learning UI-to-Code Reverse Generator Using Visual Critic Without Rendering D Soselia, K Saifullah, T Zhou arXiv preprint arXiv:2305.14637, 2023 | 1 | 2023 |
LiveBench: A Challenging, Contamination-Free LLM Benchmark C White, S Dooley, M Roberts, A Pal, B Feuer, S Jain, R Shwartz-Ziv, ... arXiv preprint arXiv:2406.19314, 2024 | | 2024 |