Can LLMs Recognize Toxicity? Structured Toxicity Investigation Framework and Semantic-Based Metric

Y Dong, R Mu, Y Zhang, S Sun, T Zhang, C Wu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the burgeoning field of Large Language Models (LLMs), developing a robust safety
mechanism, colloquially known as" safeguards" or" guardrails", has become imperative to …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

LifeTox: Unveiling Implicit Toxicity in Life Advice

M Kim, J Koo, H Lee, J Park, H Lee, K Jung - arXiv preprint arXiv …, 2023 - arxiv.org

As large language models become increasingly integrated into daily life, detecting implicit
toxicity across diverse contexts is crucial. To this end, we introduce LifeTox, a dataset …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Securing Large Language Models: Addressing Bias, Misinformation, and Prompt Attacks

B Peng, K Chen, M Li, P Feng, Z Bi, J Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) demonstrate impressive capabilities across various fields,
yet their increasing use raises critical security concerns. This article reviews recent literature …

Risks of Discrimination Violence and Unlawful Actions in LLM-Driven Robots

R Zhou - Computer Life, 2024 - drpress.org

The integration of Large Language Models (LLMs) into robotics heralds significant
advancements in human-robot interaction, enabling robots to perform complex tasks …

被引用次数：3 相关文章

[PDF] aclanthology.org

Mitigating Biases for Instruction-following Language Models via Bias Neurons Elimination

N Yang, T Kang, SJ Choi, H Lee… - Proceedings of the 62nd …, 2024 - aclanthology.org

Instruction-following language models often show undesirable biases. These undesirable
biases may be accelerated in the real-world usage of language models, where a wide range …

高级搜索

QQ 群