[PDF][PDF] A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy

H Wang, W Fu, Y Tang, Z Chen, Y Huang… - arXiv preprint arXiv …, 2025 - arxiv.org
While large language models (LLMs) present significant potential for supporting numerous
real-world applications and delivering positive social impacts, they still face significant …

Refusal-trained llms are easily jailbroken as browser agents

P Kumar, E Lau, S Vijayakumar, T Trinh… - arXiv preprint arXiv …, 2024 - arxiv.org
For safety reasons, large language models (LLMs) are trained to refuse harmful user
instructions, such as assisting dangerous activities. We study an open question in this work …

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents

Y Gan, Y Yang, Z Ma, P He, R Zeng, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
With the continuous development of large language models (LLMs), transformer-based
models have made groundbreaking advances in numerous natural language processing …

DARE to Diversify: DAta Driven and Diverse LLM REd Teaming

M Nagireddy, B Guillén Pegueroles… - Proceedings of the 30th …, 2024 - dl.acm.org
Large language models (LLMs) have been rapidly adopted, as showcased by ChatGPT's
overnight popularity, and are integrated in products used by millions of people every day …

AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens

L Lu, H Yan, Z Yuan, J Shi, W Wei, PY Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Jailbreak attacks in large language models (LLMs) entail inducing the models to generate
content that breaches ethical and legal norm through the use of malicious prompts, posing a …

Na'vi or Knave: Jailbreaking Language Models via Metaphorical Avatars

Y Yan, S Sun, J Tong, M Liu, Q Li - arXiv preprint arXiv:2412.12145, 2024 - arxiv.org
Metaphor serves as an implicit approach to convey information, while enabling the
generalized comprehension of complex subjects. However, metaphor can potentially be …

Preventing Jailbreak Prompts as Malicious Tools for Cybercriminals: A Cyber Defense Perspective

JM Tshimula, X Ndona, DJK Nkashama… - arXiv preprint arXiv …, 2024 - arxiv.org
Jailbreak prompts pose a significant threat in AI and cybersecurity, as they are crafted to
bypass ethical safeguards in large language models, potentially enabling misuse by …

DAG-Jailbreak: Enhancing Black-box Jailbreak Attacks and Defenses through DAG Dependency Analysis

J ATTACKS - openreview.net
Black-box jailbreak attacks and defenses, a critical branch of the large language model
(LLM) security, are characterized by their minimal requirement for user expertise and high …