W Lu, Z Zeng, J Wang, Z Lu, Z Chen,
H Zhuang… - arXiv preprint arXiv …, 2024 - arxiv.org
Jailbreaking attacks can enable Large Language Models (LLMs) to bypass the safeguard
and generate harmful content. Existing jailbreaking defense methods have failed to address …