An LLM can Fool Itself: A Prompt-Based Adversarial Attack

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：713 相关文章所有 3 个版本

[PDF] jair.org

Against The Achilles' Heel: A Survey on Red Teaming for Generative Models

L Lin, H Mu, Z Zhai, M Wang, Y Wang, R Wang… - Journal of Artificial …, 2025 - jair.org

Generative models are rapidly gaining popularity and being integrated into everyday
applications, raising concerns over their safe use as various vulnerabilities are exposed. In …

被引用次数：12 相关文章所有 2 个版本

Adversarial attacks and defenses for large language models (LLMs): methods, frameworks & challenges

P Kumar - International Journal of Multimedia Information …, 2024 - Springer

Large language models (LLMs) have exhibited remarkable efficacy and proficiency in a
wide array of NLP endeavors. Nevertheless, concerns are growing rapidly regarding the …

被引用次数：6 相关文章

[PDF] arxiv.org

Do LLMs dream of elephants (when told not to)? Latent concept association and associative memory in transformers

Y Jiang, G Rajendran, P Ravikumar… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have the capacity to store and recall facts. Through
experimentation with open-source models, we observe that this ability to retrieve facts can …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

TF-Attack: Transferable and fast adversarial attacks on large language models

Z Li, K Chen, L Liu, X Bai, M Yang, Y Xiang… - Knowledge-Based …, 2025 - Elsevier

With the great advancements in large language models (LLMs), adversarial attacks against
LLMs have recently attracted increasing attention. We found that pre-existing adversarial …

被引用次数：1 相关文章所有 2 个版本

[HTML] sciencedirect.com

[HTML][HTML] On large language models safety, security, and privacy: A survey

R Zhang, HW Li, XY Qian, WB Jiang… - Journal of Electronic …, 2025 - Elsevier

The integration of artificial intelligence (AI) technology, particularly large language models
(LLMs), has become essential across various sectors due to their advanced language …

被引用次数：1 相关文章

Adversarial attacks on large language models

J Zou, S Zhang, M Qiu - International Conference on Knowledge Science …, 2024 - Springer

Abstract Large Language Models (LLMs) have rapidly advanced and garnered increasing
attention due to their remarkable capabilities across various applications. However …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

Flipattack: Jailbreak llms via flipping

Y Liu, X He, M Xiong, J Fu, S Deng, B Hooi - arXiv preprint arXiv …, 2024 - arxiv.org

This paper proposes a simple yet effective jailbreak attack named FlipAttack against black-
box LLMs. First, from the autoregressive nature, we reveal that LLMs tend to understand the …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Transferable Adversarial Attacks on SAM and Its Downstream Models

S Xia, W Yang, Y Yu, X Lin, H Ding, L Duan… - arXiv preprint arXiv …, 2024 - arxiv.org

The utilization of large foundational models has a dilemma: while fine-tuning downstream
tasks from them holds promise for making use of the well-generalized knowledge in practical …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Securing vision-language models with a robust encoder against jailbreak and adversarial attacks

MZ Hossain, A Imteaj - 2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org

Large Vision-Language Models (LVLMs), trained on multimodal big datasets, have
significantly advanced AI by excelling in vision-language tasks. However, these models …

被引用次数：3 相关文章所有 3 个版本

高级搜索

QQ 群