Recent literature has shown that LLMs are vulnerable to backdoor attacks, where malicious attackers inject a secret token sequence (ie, trigger) into training prompts and enforce their …
Despite prior safety alignment efforts, mainstream LLMs can still generate harmful and unethical content when subjected to jailbreaking attacks. Existing jailbreaking methods fall …
H Wang, H Li, J Zhu, X Wang, C Pan, ML Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) are susceptible to generating harmful content when prompted with carefully crafted inputs, a vulnerability known as LLM jailbreaking. As LLMs …
Large Language Models (LLMs) have become integral to many applications, with system prompts serving as a key mechanism to regulate model behavior and ensure ethical outputs …
Modern large language model (LLM) developers typically conduct a safety alignment to prevent an LLM from generating unethical or harmful content. Recent studies have …
X Chen, Y Nie, W Guo, X Zhang - arXiv preprint arXiv:2406.08705, 2024 - arxiv.org
Recent studies developed jailbreaking attacks, which construct jailbreaking prompts to``fool''LLMs into responding to harmful questions. Early-stage jailbreaking attacks require …
Large Language Models (LLMs) have become increasingly impactful across various domains, including coding and data analysis. However, their widespread adoption has …
Jailbreaking attacks can effectively manipulate open-source large language models (LLMs) to produce harmful responses. Nevertheless, these attacks exhibit limited transferability …