A survey of safety and trustworthiness of large language models through the lens of verification and validation

X Huang, W Ruan, W Huang, G Jin, Y Dong… - Artificial Intelligence …, 2024 - Springer
Large language models (LLMs) have exploded a new heatwave of AI for their ability to
engage end-users in human-level conversations with detailed and articulate answers across …

Badchain: Backdoor chain-of-thought prompting for large language models

Z Xiang, F Jiang, Z Xiong, B Ramasubramanian… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models (LLMs) are shown to benefit from chain-of-thought (COT) prompting,
particularly when tackling tasks that require systematic reasoning processes. On the other …

Parafuzz: An interpretability-driven technique for detecting poisoned samples in nlp

L Yan, Z Zhang, G Tao, K Zhang… - Advances in …, 2024 - proceedings.neurips.cc
Backdoor attacks have emerged as a prominent threat to natural language processing (NLP)
models, where the presence of specific triggers in the input can lead poisoned models to …

Defending against weight-poisoning backdoor attacks for parameter-efficient fine-tuning

S Zhao, L Gan, LA Tuan, J Fu, L Lyu, M Jia… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently, various parameter-efficient fine-tuning (PEFT) strategies for application to
language models have been proposed and successfully implemented. However, this raises …

Test-time backdoor attacks on multimodal large language models

D Lu, T Pang, C Du, Q Liu, X Yang, M Lin - arXiv preprint arXiv …, 2024 - arxiv.org
Backdoor attacks are commonly executed by contaminating training data, such that a trigger
can activate predetermined harmful effects during the test phase. In this work, we present …

Watch out for your agents! investigating backdoor threats to llm-based agents

W Yang, X Bi, Y Lin, S Chen, J Zhou, X Sun - arXiv preprint arXiv …, 2024 - arxiv.org
Leveraging the rapid development of Large Language Models LLMs, LLM-based agents
have been developed to handle various real-world applications, including finance …

Position paper: Assessing robustness, privacy, and fairness in federated learning integrated with foundation models

X Li, J Wang - arXiv preprint arXiv:2402.01857, 2024 - arxiv.org
Federated Learning (FL), while a breakthrough in decentralized machine learning, contends
with significant challenges such as limited data availability and the variability of …

Learning to poison large language models during instruction tuning

Y Qiang, X Zhou, SZ Zade, MA Roshani, D Zytko… - arXiv preprint arXiv …, 2024 - arxiv.org
The advent of Large Language Models (LLMs) has marked significant achievements in
language processing and reasoning capabilities. Despite their advancements, LLMs face …

Transtroj: Transferable backdoor attacks to pre-trained models via embedding indistinguishability

H Wang, T Xiang, S Guo, J He, H Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Pre-trained models (PTMs) are extensively utilized in various downstream tasks. Adopting
untrusted PTMs may suffer from backdoor attacks, where the adversary can compromise the …

Synergizing Foundation Models and Federated Learning: A Survey

S Li, F Ye, M Fang, J Zhao, YH Chan, ECH Ngai… - arXiv preprint arXiv …, 2024 - arxiv.org
The recent development of Foundation Models (FMs), represented by large language
models, vision transformers, and multimodal models, has been making a significant impact …