Security and privacy challenges of large language models: A survey

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2024 - dl.acm.org
Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

Jailbreaking black box large language models in twenty queries

P Chao, A Robey, E Dobriban, H Hassani… - arXiv preprint arXiv …, 2023 - arxiv.org
There is growing interest in ensuring that large language models (LLMs) align with human
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu… - arXiv preprint arXiv …, 2024 - arxiv.org
Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …

PLeak: Prompt Leaking Attacks against Large Language Model Applications

B Hui, H Yuan, N Gong, P Burlina, Y Cao - … of the 2024 on ACM SIGSAC …, 2024 - dl.acm.org
Large Language Models (LLMs) enable a new ecosystem with many downstream
applications, called LLM applications, with different natural language processing tasks. The …

Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness

O Friha, MA Ferrag, B Kantarci… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org
The integration of Large Language Models (LLMs) and Edge Intelligence (EI) introduces a
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …

Improving alignment and robustness with circuit breakers

A Zou, L Phan, J Wang, D Duenas, M Lin… - The Thirty-eighth …, 2024 - openreview.net
AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We
present an approach, inspired by recent advances in representation engineering, that …

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arXiv preprint arXiv …, 2024 - arxiv.org
This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

The inadequacy of reinforcement learning from human feedback-radicalizing large language models via semantic vulnerabilities

TR McIntosh, T Susnjak, T Liu, P Watters… - … on Cognitive and …, 2024 - ieeexplore.ieee.org
This study is an empirical investigation into the semantic vulnerabilities of four popular pre-
trained commercial Large Language Models (LLMs) to ideological manipulation. Using …