Fine-tuning aligned language models compromises safety, even when users do not intend to!

X Qi, Y Zeng, T Xie, PY Chen, R Jia, P Mittal… - arXiv preprint arXiv …, 2023 - arxiv.org
Optimizing large language models (LLMs) for downstream use cases often involves the
customization of pre-trained LLMs through further fine-tuning. Meta's open release of Llama …

Backdoor attacks and defenses targeting multi-domain ai models: A comprehensive review

S Zhang, Y Pan, Q Liu, Z Yan, KKR Choo… - ACM Computing …, 2024 - dl.acm.org
Since the emergence of security concerns in artificial intelligence (AI), there has been
significant attention devoted to the examination of backdoor attacks. Attackers can utilize …

Backdoor learning: A survey

Y Li, Y Jiang, Z Li, ST Xia - IEEE Transactions on Neural …, 2022 - ieeexplore.ieee.org
Backdoor attack intends to embed hidden backdoors into deep neural networks (DNNs), so
that the attacked models perform well on benign samples, whereas their predictions will be …

A comprehensive overview of backdoor attacks in large language models within communication networks

H Yang, K Xiang, M Ge, H Li, R Lu, S Yu - IEEE Network, 2024 - ieeexplore.ieee.org
The Large Language Models (LLMs) are poised to offer efficient and intelligent services for
future mobile communication networks, owing to their exceptional capabilities in language …

Privacy in large language models: Attacks, defenses and future directions

H Li, Y Chen, J Luo, J Wang, H Peng, Y Kang… - arXiv preprint arXiv …, 2023 - arxiv.org
The advancement of large language models (LLMs) has significantly enhanced the ability to
effectively tackle various downstream NLP tasks and unify these tasks into generative …

Spinning language models: Risks of propaganda-as-a-service and countermeasures

E Bagdasaryan, V Shmatikov - 2022 IEEE Symposium on …, 2022 - ieeexplore.ieee.org
We investigate a new threat to neural sequence-to-sequence (seq2seq) models: training-
time attacks that cause models to “spin” their outputs so as to support an adversary-chosen …

Federated large language model: A position paper

C Chen, X Feng, J Zhou, J Yin, X Zheng - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
Large scale language models (LLM) have received significant attention and found diverse
applications across various domains, but their development encounters challenges in real …

Notable: Transferable backdoor attacks against prompt-based nlp models

K Mei, Z Li, Z Wang, Y Zhang, S Ma - arXiv preprint arXiv:2305.17826, 2023 - arxiv.org
Prompt-based learning is vulnerable to backdoor attacks. Existing backdoor attacks against
prompt-based models consider injecting backdoors into the entire embedding layers or word …

Artificial intelligence foundation and pre-trained models: Fundamentals, applications, opportunities, and social impacts

A Kolides, A Nawaz, A Rathor, D Beeman… - … Modelling Practice and …, 2023 - Elsevier
With the emergence of foundation models (FMs) that are trained on large amounts of data at
scale and adaptable to a wide range of downstream applications, AI is experiencing a …

Backdooring multimodal learning

X Han, Y Wu, Q Zhang, Y Zhou, Y Xu… - … IEEE Symposium on …, 2024 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) are vulnerable to backdoor attacks, which poison the training
set to alter the model prediction over samples with a specific trigger. While existing efforts …