Backdoor pre-trained models can transfer to all

L Shen, S Ji, X Zhang, J Li, J Chen, J Shi… - arXiv preprint arXiv …, 2021 - arxiv.org
Pre-trained general-purpose language models have been a dominating component in
enabling real-world natural language processing (NLP) applications. However, a pre-trained …

Badpre: Task-agnostic backdoor attacks to pre-trained nlp foundation models

K Chen, Y Meng, X Sun, S Guo, T Zhang, J Li… - arXiv preprint arXiv …, 2021 - arxiv.org
Pre-trained Natural Language Processing (NLP) models can be easily adapted to a variety
of downstream language tasks. This significantly accelerates the development of language …

Moderate-fitting as a natural backdoor defender for pre-trained language models

B Zhu, Y Qin, G Cui, Y Chen, W Zhao… - Advances in …, 2022 - proceedings.neurips.cc
Despite the great success of pre-trained language models (PLMs) in a large set of natural
language processing (NLP) tasks, there has been a growing concern about their security in …

Setting the trap: Capturing and defeating backdoors in pretrained language models through honeypots

RR Tang, J Yuan, Y Li, Z Liu… - Advances in Neural …, 2023 - proceedings.neurips.cc
In the field of natural language processing, the prevalent approach involves fine-tuning
pretrained language models (PLMs) using local samples. Recent research has exposed the …

Training-free lexical backdoor attacks on language models

Y Huang, TY Zhuo, Q Xu, H Hu, X Yuan… - Proceedings of the ACM …, 2023 - dl.acm.org
Large-scale language models have achieved tremendous success across various natural
language processing (NLP) applications. Nevertheless, language models are vulnerable to …

Rethinking stealthiness of backdoor attack against nlp models

W Yang, Y Lin, P Li, J Zhou, X Sun - … of the 59th Annual Meeting of …, 2021 - aclanthology.org
Recent researches have shown that large natural language processing (NLP) models are
vulnerable to a kind of security threat called the Backdoor Attack. Backdoor attacked models …

Turn the combination lock: Learnable textual backdoor attacks via word substitution

F Qi, Y Yao, S Xu, Z Liu, M Sun - arXiv preprint arXiv:2106.06361, 2021 - arxiv.org
Recent studies show that neural natural language processing (NLP) models are vulnerable
to backdoor attacks. Injected with backdoors, models perform normally on benign examples …

Hidden backdoors in human-centric language models

S Li, H Liu, T Dong, BZH Zhao, M Xue, H Zhu… - Proceedings of the 2021 …, 2021 - dl.acm.org
Natural language processing (NLP) systems have been proven to be vulnerable to backdoor
attacks, whereby hidden features (backdoors) are trained into a language model and may …

Threats to pre-trained language models: Survey and taxonomy

S Guo, C Xie, J Li, L Lyu, T Zhang - arXiv preprint arXiv:2202.06862, 2022 - arxiv.org
Pre-trained language models (PTLMs) have achieved great success and remarkable
performance over a wide range of natural language processing (NLP) tasks. However, there …

Badchain: Backdoor chain-of-thought prompting for large language models

Z Xiang, F Jiang, Z Xiong, B Ramasubramanian… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are shown to benefit from chain-of-thought (COT) prompting,
particularly when tackling tasks that require systematic reasoning processes. On the other …