On the exploitability of instruction tuning

[HTML][HTML] A survey on large language model (llm) security and privacy: The good, the bad, and the ugly

Y Yao, J Duan, K Xu, Y Cai, Z Sun, Y Zhang - High-Confidence Computing, 2024 - Elsevier

Abstract Large Language Models (LLMs), such as ChatGPT and Bard, have revolutionized
natural language understanding and generation. They possess deep language …

被引用次数：239 相关文章所有 11 个版本

[PDF] wiley.com

Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - AI Magazine, 2023 - Wiley Online Library

Misinformation such as fake news and rumors is a serious threat for information ecosystems
and public trust. The emergence of large language models (LLMs) has great potential to …

被引用次数：74 相关文章所有 4 个版本

[PDF] aclanthology.org

Backdooring instruction-tuned large language models with virtual prompt injection

J Yan, V Yadav, S Li, L Chen, Z Tang… - Proceedings of the …, 2024 - aclanthology.org

Abstract Instruction-tuned Large Language Models (LLMs) have become a ubiquitous
platform for open-ended applications due to their ability to modulate responses based on …

被引用次数：35 相关文章所有 4 个版本

[PDF] arxiv.org

Virtual prompt injection for instruction-tuned large language models

J Yan, V Yadav, S Li, L Chen, Z Tang, H Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

We present Virtual Prompt Injection (VPI) for instruction-tuned Large Language Models
(LLMs). VPI allows an attacker-specified virtual prompt to steer the model behavior under …

被引用次数：13 相关文章所有 2 个版本

[PDF] arxiv.org

On the exploitability of reinforcement learning with human feedback for large language models

J Wang, J Wu, M Chen, Y Vorobeychik… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align
Large Language Models (LLMs) with human preferences, playing an important role in LLMs …

被引用次数：7 相关文章所有 3 个版本

[PDF] arxiv.org

Breaking down the defenses: A comparative survey of attacks on large language models

AG Chowdhury, MM Islam, V Kumar, FH Shezan… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have become a cornerstone in the field of Natural
Language Processing (NLP), offering transformative capabilities in understanding and …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Safeguarding Large Language Models: A Survey

Y Dong, R Mu, Y Zhang, S Sun, T Zhang, C Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the burgeoning field of Large Language Models (LLMs), developing a robust safety
mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

One shot learning as instruction data prospector for large language models

Y Li, B Hui, X Xia, J Yang, M Yang, L Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Aligning large language models (LLMs) with human is a critical step in effectively utilizing
their pre-trained capabilities across a wide array of language tasks. Current instruction …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Beear: Embedding-based adversarial removal of safety backdoors in instruction-tuned language models

Y Zeng, W Sun, TN Huynh, D Song, B Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of
unsafe behaviors while evading detection during normal interactions. The high …

被引用次数：2 相关文章所有 3 个版本

[PDF] aclanthology.org

Security challenges in natural language processing models

Q Xu, X He - Proceedings of the 2023 Conference on Empirical …, 2023 - aclanthology.org

Large-scale natural language processing models have been developed and integrated into
numerous applications, given the advantage of their remarkable performance. Nonetheless …

被引用次数：6 相关文章所有 5 个版本

高级搜索

QQ 群