Q Liu,
Z Zhou,
L He, Y Liu, W Zhang… - Proceedings of the 2024 …, 2024 - aclanthology.org
Large language models are susceptible to jailbreak attacks, which can result in the
generation of harmful content. While prior defenses mitigate these risks by perturbing or …