Safeguarding Large Language Models: A Survey

Y Dong, R Mu, Y Zhang, S Sun, T Zhang, C Wu… - arXiv preprint arXiv …, 2024 - arxiv.org
In the burgeoning field of Large Language Models (LLMs), developing a robust safety
mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …

LifeTox: Unveiling Implicit Toxicity in Life Advice

M Kim, J Koo, H Lee, J Park, H Lee, K Jung - arXiv preprint arXiv …, 2023 - arxiv.org
As large language models become increasingly integrated into daily life, detecting implicit
toxicity across diverse contexts is crucial. To this end, we introduce LifeTox, a dataset …

Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks

B Peng, K Chen, M Li, P Feng, Z Bi, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) demonstrate impressive capabilities across various fields,
yet their increasing use raises critical security concerns. This article reviews recent literature …

Risks of Discrimination Violence and Unlawful Actions in LLM-Driven Robots

R Zhou - Computer Life, 2024 - drpress.org
The integration of Large Language Models (LLMs) into robotics heralds significant
advancements in human-robot interaction, enabling robots to perform complex tasks …

Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination

N Yang, T Kang, SJ Choi, H Lee… - Proceedings of the 62nd …, 2024 - aclanthology.org
Instruction-following language models often show undesirable biases. These undesirable
biases may be accelerated in the real-world usage of language models, where a wide range …