Adversarial tuning: Defending against jailbreak attacks for llms

F Liu, Z Xu, H Liu - arXiv preprint arXiv:2406.06622, 2024 - arxiv.org
Although safely enhanced Large Language Models (LLMs) have achieved remarkable
success in tackling various complex tasks in a zero-shot manner, they remain susceptible to …

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

S Han - arXiv preprint arXiv:2410.18114, 2024 - arxiv.org
Significant progress has been made in AI safety. However, as this field thrives, a critical
question emerges: Are our current efforts aligned with the broader perspective of history and …