C Chen, K Shu - AI Magazine, 2023 - Wiley Online Library
Misinformation such as fake news and rumors is a serious threat for information ecosystems and public trust. The emergence of large language models (LLMs) has great potential to …
Abstract Instruction-tuned Large Language Models (LLMs) have become a ubiquitous platform for open-ended applications due to their ability to modulate responses based on …
We present Virtual Prompt Injection (VPI) for instruction-tuned Large Language Models (LLMs). VPI allows an attacker-specified virtual prompt to steer the model behavior under …
Reinforcement Learning with Human Feedback (RLHF) is a methodology designed to align Large Language Models (LLMs) with human preferences, playing an important role in LLMs …
Large Language Models (LLMs) have become a cornerstone in the field of Natural Language Processing (NLP), offering transformative capabilities in understanding and …
In the burgeoning field of Large Language Models (LLMs), developing a robust safety mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …
Aligning large language models (LLMs) with human is a critical step in effectively utilizing their pre-trained capabilities across a wide array of language tasks. Current instruction …
Safety backdoor attacks in large language models (LLMs) enable the stealthy triggering of unsafe behaviors while evading detection during normal interactions. The high …
Q Xu, X He - Proceedings of the 2023 Conference on Empirical …, 2023 - aclanthology.org
Large-scale natural language processing models have been developed and integrated into numerous applications, given the advantage of their remarkable performance. Nonetheless …