Baseline defenses for adversarial attacks against aligned language models

BC Das, MH Amini, Y Wu - ACM Computing Surveys, 2024 - dl.acm.org

Large language models (LLMs) have demonstrated extraordinary capabilities and
contributed to multiple fields, such as generating and summarizing text, language …

被引用次数：79 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier

Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

被引用次数：460 相关文章所有 11 个版本

[PDF] arxiv.org

Jailbreaking black box large language models in twenty queries

P Chao, A Robey, E Dobriban, H Hassani… - arXiv preprint arXiv …, 2023 - arxiv.org

There is growing interest in ensuring that large language models (LLMs) align with human
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …

被引用次数：407 相关文章所有 4 个版本

[PDF] aaai.org

Red-Teaming for generative AI: Silver bullet or security theater?

M Feffer, A Sinha, WH Deng, ZC Lipton… - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In response to rising concerns surrounding the safety, security, and trustworthiness of
Generative AI (GenAI) models, practitioners and regulators alike have pointed to AI red …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

Harmbench: A standardized evaluation framework for automated red teaming and robust refusal

M Mazeika, L Phan, X Yin, A Zou, Z Wang, N Mu… - arXiv preprint arXiv …, 2024 - arxiv.org

Automated red teaming holds substantial promise for uncovering and mitigating the risks
associated with the malicious use of large language models (LLMs), yet the field lacks a …

被引用次数：146 相关文章所有 3 个版本

[PDF] acm.org

PLeak: Prompt Leaking Attacks against Large Language Model Applications

B Hui, H Yuan, N Gong, P Burlina, Y Cao - … of the 2024 on ACM SIGSAC …, 2024 - dl.acm.org

Large Language Models (LLMs) enable a new ecosystem with many downstream
applications, called LLM applications, with different natural language processing tasks. The …

被引用次数：21 相关文章所有 2 个版本

[PDF] ieee.org

Llm-based edge intelligence: A comprehensive survey on architectures, applications, security and trustworthiness

O Friha, MA Ferrag, B Kantarci… - IEEE Open Journal …, 2024 - ieeexplore.ieee.org

The integration of Large Language Models (LLMs) and Edge Intelligence (EI) introduces a
groundbreaking paradigm for intelligent edge devices. With their capacity for human-like …

被引用次数：20 相关文章所有 2 个版本

[PDF] openreview.net

Improving alignment and robustness with circuit breakers

A Zou, L Phan, J Wang, D Duenas, M Lin… - The Thirty-eighth …, 2024 - openreview.net

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We
present an approach, inspired by recent advances in representation engineering, that …

被引用次数：31 相关文章

[PDF] arxiv.org

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arXiv preprint arXiv …, 2024 - arxiv.org

This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

被引用次数：20 相关文章所有 3 个版本

The inadequacy of reinforcement learning from human feedback-radicalizing large language models via semantic vulnerabilities

TR McIntosh, T Susnjak, T Liu, P Watters… - … on Cognitive and …, 2024 - ieeexplore.ieee.org

This study is an empirical investigation into the semantic vulnerabilities of four popular pre-
trained commercial Large Language Models (LLMs) to ideological manipulation. Using …

被引用次数：80 相关文章

高级搜索

QQ 群