Cross-task defense: Instruction-tuning llms for content safety

文章

学术资源搜索

获得 2 条结果（用时0.04秒）

我的图书馆

Cross-task defense: Instruction-tuning llms for content safety

在引用文章中搜索

[PDF] arxiv.org

Adversarial tuning: Defending against jailbreak attacks for llms

F Liu, Z Xu, H Liu - arXiv preprint arXiv:2406.06622, 2024 - arxiv.org

Although safely enhanced Large Language Models (LLMs) have achieved remarkable
success in tackling various complex tasks in a zero-shot manner, they remain susceptible to …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

S Han - arXiv preprint arXiv:2410.18114, 2024 - arxiv.org

Significant progress has been made in AI safety. However, as this field thrives, a critical
question emerges: Are our current efforts aligned with the broader perspective of history and …

高级搜索

QQ 群

Cross-task defense: Instruction-tuning llms for content safety

Adversarial tuning: Defending against jailbreak attacks for llms

Bridging Today and the Future of Humanity: AI Safety in 2024 and Beyond

引用