[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier
Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2023 - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

Backdooring instruction-tuned large language models with virtual prompt injection

J Yan, V Yadav, S Li, L Chen, Z Tang… - Proceedings of the …, 2024 - aclanthology.org
Abstract Instruction-tuned Large Language Models (LLMs) have become a ubiquitous
platform for open-ended applications due to their ability to modulate responses based on …

Virtual prompt injection for instruction-tuned large language models

J Yan, V Yadav, S Li, L Chen, Z Tang, H Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
We present Virtual Prompt Injection (VPI) for instruction-tuned Large Language Models
(LLMs). VPI allows an attacker-specified virtual prompt to steer the model behavior under …

On the exploitability of reinforcement learning with human feedback for large language models

J Wang, J Wu, M Chen, Y Vorobeychik… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align
Large Language Models (LLMs) with human preferences, playing an important role in LLMs …

Breaking down the defenses: A comparative survey of attacks on large language models

AG Chowdhury, MM Islam, V Kumar, FH Shezan… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have become a cornerstone in the field of Natural
Language Processing (NLP), offering transformative capabilities in understanding and …

Safeguarding Large Language Models: A Survey

Y Dong, R Mu, Y Zhang, S Sun, T Zhang, C Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the burgeoning field of Large Language Models (LLMs), developing a robust safety
mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …

One shot learning as instruction data prospector for large language models

Y Li, B Hui, X Xia, J Yang, M Yang, L Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Aligning large language models (LLMs) with human is a critical step in effectively utilizing
their pre-trained capabilities across a wide array of language tasks. Current instruction …

Beear: Embedding-based adversarial removal of safety backdoors in instruction-tuned language models

Y Zeng, W Sun, TN Huynh, D Song, B Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of
unsafe behaviors while evading detection during normal interactions. The high …

Security challenges in natural language processing models

Q Xu, X He - Proceedings of the 2023 Conference on Empirical …, 2023 - aclanthology.org
Large-scale natural language processing models have been developed and integrated into
numerous applications, given the advantage of their remarkable performance. Nonetheless …