Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Risk taxonomy, mitigation, and assessment benchmarks of large language model systems

T Cui, Y Wang, C Fu, Y Xiao, S Li, X Deng, Y Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) have strong capabilities in solving diverse natural language
processing tasks. However, the safety and security issues of LLM systems have become the …

Generative AI security: challenges and countermeasures

B Zhu, N Mu, J Jiao, D Wagner - arXiv preprint arXiv:2402.12617, 2024 - arxiv.org
arXiv:2402.12617v1 [cs.CR] 20 Feb 2024 Page 1 Generative AI Security: Challenges and
Countermeasures Banghua Zhu1, Norman Mu1, Jiantao Jiao1, and David Wagner1 1University …

Watermark Smoothing Attacks against Language Models

H Chang, H Hassani, R Shokri - arXiv preprint arXiv:2407.14206, 2024 - arxiv.org
Watermarking is a technique used to embed a hidden signal in the probability distribution of
text generated by large language models (LLMs), enabling attribution of the text to the …

Robust detection of watermarks for large language models under human edits

X Li, F Ruan, H Wang, Q Long, WJ Su - arXiv preprint arXiv:2411.13868, 2024 - arxiv.org
Watermarking has offered an effective approach to distinguishing text generated by large
language models (LLMs) from human-written text. However, the pervasive presence of …

SoK: Watermarking for AI-Generated Content

X Zhao, S Gunn, M Christ, J Fairoze, A Fabrega… - arXiv preprint arXiv …, 2024 - arxiv.org
As the outputs of generative AI (GenAI) techniques improve in quality, it becomes
increasingly challenging to distinguish them from human-created content. Watermarking …

Data-Centric AI in the Age of Large Language Models

X Xu, Z Wu, R Qiao, A Verma, Y Shu, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
This position paper proposes a data-centric viewpoint of AI research, focusing on large
language models (LLMs). We start by making the key observation that data is instrumental in …

Position Paper: Data-Centric AI in the Age of Large Language Models

X Xu, Z Wu, R Qiao, A Verma, Y Shu… - Findings of the …, 2024 - aclanthology.org
This position paper proposes a data-centric viewpoint of AI research, focusing on large
language models (LLMs). We start by making a key observation that data is instrumental in …

A Statistical Framework of Watermarks for Large Language Models: Pivot, Detection Efficiency and Optimal Rules

X Li, F Ruan, H Wang, Q Long, WJ Su - arXiv preprint arXiv:2404.01245, 2024 - arxiv.org
Since ChatGPT was introduced in November 2022, embedding (nearly) unnoticeable
statistical signals into text generated by large language models (LLMs), also known as …

A Watermark for Black-Box Language Models

D Bahri, J Wieting, D Alon, D Metzler - arXiv preprint arXiv:2410.02099, 2024 - arxiv.org
Watermarking has recently emerged as an effective strategy for detecting the outputs of
large language models (LLMs). Most existing schemes require\emph {white-box} access to …