[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

Survey of vulnerabilities in large language models revealed by adversarial attacks

E Shayegani, MAA Mamun, Y Fu, P Zaree… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) are swiftly advancing in architecture and capability, and as
they integrate more deeply into complex systems, the urgency to scrutinize their security …

Visual adversarial examples jailbreak aligned large language models

X Qi, K Huang, A Panda, P Henderson… - Proceedings of the …, 2024 - ojs.aaai.org
Warning: this paper contains data, prompts, and model outputs that are offensive in nature.
Recently, there has been a surge of interest in integrating vision into Large Language …

Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - arXiv preprint arXiv:2311.05656, 2023 - arxiv.org
Misinformation such as fake news and rumors is a serious threat on information ecosystems
and public trust. The emergence of Large Language Models (LLMs) has great potential to …

Explore, establish, exploit: Red teaming language models from scratch

S Casper, J Lin, J Kwon, G Culp… - arXiv preprint arXiv …, 2023 - arxiv.org
Deploying Large language models (LLMs) can pose hazards from harmful outputs such as
toxic or dishonest speech. Prior work has introduced tools that elicit harmful outputs in order …

Defending against alignment-breaking attacks via robustly aligned llm

B Cao, Y Cao, L Lin, J Chen - arXiv preprint arXiv:2309.14348, 2023 - arxiv.org
Recently, Large Language Models (LLMs) have made significant advancements and are
now widely used across various domains. Unfortunately, there has been a rising concern …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Automatically Correcting Large Language Models: Surveying the Landscape of Diverse Automated Correction Strategies

L Pan, M Saxon, W Xu, D Nathani, X Wang… - Transactions of the …, 2024 - direct.mit.edu
While large language models (LLMs) have shown remarkable effectiveness in various NLP
tasks, they are still prone to issues such as hallucination, unfaithful reasoning, and toxicity. A …

Autodan: Automatic and interpretable adversarial attacks on large language models

S Zhu, R Zhang, B An, G Wu, J Barrow, Z Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Safety alignment of Large Language Models (LLMs) can be compromised with manual
jailbreak attacks and (automatic) adversarial attacks. Recent work suggests that patching …

Safedecoding: Defending against jailbreak attacks via safety-aware decoding

Z Xu, F Jiang, L Niu, J Jia, BY Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
As large language models (LLMs) become increasingly integrated into real-world
applications such as code generation and chatbot assistance, extensive efforts have been …