Coercing LLMs to do and reveal (almost) anything

S Yi, Y Liu, Z Sun, T Cong, X He, J Song, K Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have performed exceptionally in various text-generative
tasks, including question answering, translation, code completion, etc. However, the over …

被引用次数：28 相关文章所有 4 个版本

[PDF] arxiv.org

International Scientific Report on the Safety of Advanced AI (Interim Report)

Y Bengio, S Mindermann, D Privitera… - arXiv preprint arXiv …, 2024 - arxiv.org

This is the interim publication of the first International Scientific Report on the Safety of
Advanced AI. The report synthesises the scientific understanding of general-purpose AI--AI …

被引用次数：20 相关文章所有 3 个版本

[PDF] arxiv.org

Jailbreakzoo: Survey, landscapes, and horizons in jailbreaking large language and vision-language models

H Jin, L Hu, X Li, P Zhang, C Chen, J Zhuang… - arXiv preprint arXiv …, 2024 - arxiv.org

The rapid evolution of artificial intelligence (AI) through developments in Large Language
Models (LLMs) and Vision-Language Models (VLMs) has brought significant advancements …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

The instruction hierarchy: Training llms to prioritize privileged instructions

E Wallace, K Xiao, R Leike, L Weng, J Heidecke… - arXiv preprint arXiv …, 2024 - arxiv.org

Today's LLMs are susceptible to prompt injections, jailbreaks, and other attacks that allow
adversaries to overwrite a model's original instructions with their own malicious prompts. In …

被引用次数：71 相关文章所有 2 个版本

[PDF] arxiv.org

Compromising embodied agents with contextual backdoor attacks

A Liu, Y Zhou, X Liu, T Zhang, S Liang, J Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) have transformed the development of embodied
intelligence. By providing a few contextual demonstrations, developers can utilize the …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Capabilities of large language models in control engineering: A benchmark study on gpt-4, claude 3 opus, and gemini 1.0 ultra

D Kevian, U Syed, X Guo, A Havens, G Dullerud… - arXiv preprint arXiv …, 2024 - arxiv.org

In this paper, we explore the capabilities of state-of-the-art large language models (LLMs)
such as GPT-4, Claude 3 Opus, and Gemini 1.0 Ultra in solving undergraduate-level control …

被引用次数：28 相关文章所有 2 个版本

[PDF] arxiv.org

Glitchprober: Advancing effective detection and mitigation of glitch tokens in large language models

Z Zhang, W Bai, Y Li, MH Meng, K Wang, L Shi… - Proceedings of the 39th …, 2024 - dl.acm.org

Large language models (LLMs) have achieved unprecedented success in the field of natural
language processing. However, the black-box nature of their internal mechanisms has …

被引用次数：3 相关文章所有 6 个版本

[PDF] arxiv.org

Mission impossible: A statistical perspective on jailbreaking llms

J Su, J Kempe, K Ullrich - arXiv preprint arXiv:2408.01420, 2024 - arxiv.org

Large language models (LLMs) are trained on a deluge of text data with limited quality
control. As a result, LLMs can exhibit unintended or even harmful behaviours, such as …

被引用次数：4 相关文章所有 5 个版本

[PDF] arxiv.org

Rethinking llm memorization through the lens of adversarial compression

A Schwarzschild, Z Feng, P Maini, ZC Lipton… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs) trained on web-scale datasets raise substantial concerns
regarding permissible data usage. One major question is whether these models" memorize" …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

A Realistic Threat Model for Large Language Model Jailbreaks

V Boreiko, A Panfilov, V Voracek, M Hein… - arXiv preprint arXiv …, 2024 - arxiv.org

A plethora of jailbreaking attacks have been proposed to obtain harmful responses from
safety-tuned LLMs. In their original settings, these methods all largely succeed in coercing …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群