Poisoning web-scale training datasets is practical

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：285 相关文章所有 3 个版本

[PDF] neurips.cc

Datacomp: In search of the next generation of multimodal datasets

SY Gadre, G Ilharco, A Fang… - Advances in …, 2024 - proceedings.neurips.cc

Multimodal datasets are a critical component in recent breakthroughs such as CLIP, Stable
Diffusion and GPT-4, yet their design does not receive the same research attention as model …

被引用次数：206 相关文章所有 9 个版本

[PDF] arxiv.org

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

被引用次数：273 相关文章所有 6 个版本

[PDF] arxiv.org

The curse of recursion: Training on generated data makes models forget

I Shumailov, Z Shumaylov, Y Zhao, Y Gal… - arXiv preprint arXiv …, 2023 - arxiv.org

Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3 (. 5) and
GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT …

被引用次数：184 相关文章所有 4 个版本

[PDF] arxiv.org

Trustworthy LLMs: A survey and guideline for evaluating large language models' alignment

Y Liu, Y Yao, JF Ton, X Zhang, R Guo, H Cheng… - arXiv preprint arXiv …, 2023 - arxiv.org

Ensuring alignment, which refers to making models behave in accordance with human
intentions [1, 2], has become a critical task before deploying large language models (LLMs) …

被引用次数：155 相关文章所有 3 个版本

[PDF] neurips.cc

On the exploitability of instruction tuning

M Shu, J Wang, C Zhu, J Geiping… - Advances in Neural …, 2023 - proceedings.neurips.cc

Instruction tuning is an effective technique to align large language models (LLMs) with
human intent. In this work, we investigate how an adversary can exploit instruction tuning by …

被引用次数：48 相关文章所有 6 个版本

[PDF] nowpublishers.com

Identifying and mitigating the security risks of generative ai

C Barrett, B Boyd, E Bursztein, N Carlini… - … and Trends® in …, 2023 - nowpublishers.com

Every major technical invention resurfaces the dual-use dilemma—the new technology has
the potential to be used for good as well as for harm. Generative AI (GenAI) techniques, such …

被引用次数：52 相关文章所有 7 个版本

[PDF] springer.com

A survey of safety and trustworthiness of large language models through the lens of verification and validation

X Huang, W Ruan, W Huang, G Jin, Y Dong… - Artificial Intelligence …, 2024 - Springer

Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …

被引用次数：61 相关文章所有 6 个版本

[PDF] arxiv.org

Provable robust watermarking for ai-generated text

X Zhao, P Ananth, L Li, YX Wang - arXiv preprint arXiv:2306.17439, 2023 - arxiv.org

As AI-generated text increasingly resembles human-written content, the ability to detect
machine-generated text becomes crucial. To address this challenge, we present …

被引用次数：66 相关文章所有 5 个版本

[PDF] arxiv.org

Rethinking machine unlearning for large language models

S Liu, Y Yao, J Jia, S Casper, N Baracaldo… - arXiv preprint arXiv …, 2024 - arxiv.org

We explore machine unlearning (MU) in the domain of large language models (LLMs),
referred to as LLM unlearning. This initiative aims to eliminate undesirable data influence …

被引用次数：45 相关文章所有 2 个版本

高级搜索

QQ 群