Leveraging the Context through Multi-Round Interactions for Jailbreaking Attacks

H Wang, W Fu, Y Tang, Z Chen, Y Huang… - arXiv preprint arXiv …, 2025 - arxiv.org

While large language models (LLMs) present significant potential for supporting numerous
real-world applications and delivering positive social impacts, they still face significant …

Refusal-trained llms are easily jailbroken as browser agents

P Kumar, E Lau, S Vijayakumar, T Trinh… - arXiv preprint arXiv …, 2024 - arxiv.org

For safety reasons, large language models (LLMs) are trained to refuse harmful user
instructions, such as assisting dangerous activities. We study an open question in this work …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

Navigating the risks: A survey of security, privacy, and ethics threats in llm-based agents

Y Gan, Y Yang, Z Ma, P He, R Zeng, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

With the continuous development of large language models (LLMs), transformer-based
models have made groundbreaking advances in numerous natural language processing …

被引用次数：1 相关文章所有 2 个版本

DARE to Diversify: DAta Driven and Diverse LLM REd Teaming

M Nagireddy, B Guillén Pegueroles… - Proceedings of the 30th …, 2024 - dl.acm.org

Large language models (LLMs) have been rapidly adopted, as showcased by ChatGPT's
overnight popularity, and are integrated in products used by millions of people every day …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens

L Lu, H Yan, Z Yuan, J Shi, W Wei, PY Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Jailbreak attacks in large language models (LLMs) entail inducing the models to generate
content that breaches ethical and legal norm through the use of malicious prompts, posing a …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

高级搜索

QQ 群