Jailbreak antidote: Runtime safety-utility balance via sparse representation adjustment in...

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

Jailbreak antidote: Runtime safety-utility balance via sparse representation adjustment in...

在引用文章中搜索

[PDF] arxiv.org

Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models

L Gao, X Zhang, P Nakov, X Chen - arXiv preprint arXiv:2412.17034, 2024 - arxiv.org

Jailbreaking in Large Language Models (LLMs) is a major security concern as it can deceive
LLMs to generate harmful text. Yet, there is still insufficient understanding of how …

Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs

J Fonseca, A Bell, J Stoyanovich - arXiv preprint arXiv:2501.02018, 2025 - arxiv.org

Large Language Models (LLMs) have been shown to be susceptible to jailbreak attacks, or
adversarial attacks used to illicit high risk behavior from a model. Jailbreaks have been …

[PDF][PDF] Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks

X Yi, Y Li, L Wang, X Wang, L He - arXiv preprint arXiv:2501.10639, 2025 - arxiv.org

Ensuring safety alignment has become a critical requirement for large language models
(LLMs), particularly given their widespread deployment in real-world applications. However …

高级搜索

QQ 群

Jailbreak antidote: Runtime safety-utility balance via sparse representation adjustment in...

Shaping the Safety Boundaries: Understanding and Defending Against Jailbreaks in Large Language Models

Safeguarding Large Language Models in Real-time with Tunable Safety-Performance Trade-offs

[PDF][PDF] Latent-space adversarial training with post-aware calibration for defending large language models against jailbreak attacks

引用