X Yi, S Zheng, L Wang, X Wang, L He - arXiv preprint arXiv:2405.09055, 2024 - arxiv.org
The current safeguard mechanisms for large language models (LLMs) are indeed
susceptible to jailbreak attacks, making them inherently fragile. Even the process of fine …