Toxicity in chatgpt: Analyzing persona-assigned language models

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

被引用次数：1926 相关文章所有 4 个版本

[PDF] springer.com

A survey on large language model based autonomous agents

L Wang, C Ma, X Feng, Z Zhang, H Yang… - Frontiers of Computer …, 2024 - Springer

Autonomous agents have long been a research focus in academic and industry
communities. Previous research often focuses on training agents with limited knowledge …

被引用次数：880 相关文章所有 4 个版本

[PDF] github.io

The rise and potential of large language model based agents: A survey

Z Xi, W Chen, X Guo, W He, Y Ding, B Hong… - arXiv preprint arXiv …, 2023 - arxiv.org

For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing
the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are …

被引用次数：675 相关文章所有 4 个版本

[PDF] neurips.cc

Beavertails: Towards improved safety alignment of llm via a human-preference dataset

J Ji, M Liu, J Dai, X Pan, C Zhang… - Advances in …, 2024 - proceedings.neurips.cc

In this paper, we introduce the BeaverTails dataset, aimed at fostering research on safety
alignment in large language models (LLMs). This dataset uniquely separates annotations of …

被引用次数：286 相关文章所有 8 个版本

[PDF] arxiv.org

Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts

J Yu, X Lin, Z Yu, X Xing - arXiv preprint arXiv:2309.10253, 2023 - arxiv.org

Large language models (LLMs) have recently experienced tremendous popularity and are
widely used from casual conversations to AI-driven programming. However, despite their …

被引用次数：223 相关文章所有 2 个版本

[PDF] arxiv.org

Challenges and applications of large language models

J Kaddour, J Harris, M Mozes, H Bradley… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) went from non-existent to ubiquitous in the machine
learning discourse within a few years. Due to the fast pace of the field, it is difficult to identify …

被引用次数：451 相关文章所有 3 个版本

[PDF] arxiv.org

Trustllm: Trustworthiness in large language models

Y Huang, L Sun, H Wang, S Wu, Q Zhang, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large language models (LLMs), exemplified by ChatGPT, have gained considerable
attention for their excellent natural language processing capabilities. Nonetheless, these …

被引用次数：232 相关文章所有 4 个版本

[PDF] neurips.cc

In-context impersonation reveals Large Language Models' strengths and biases

L Salewski, S Alaniz, I Rio-Torto… - Advances in neural …, 2023 - proceedings.neurips.cc

In everyday conversations, humans can take on different roles and adapt their vocabulary to
their chosen roles. We explore whether LLMs can take on, that is impersonate, different roles …

被引用次数：126 相关文章所有 12 个版本

[PDF] arxiv.org

Jailbreaking black box large language models in twenty queries

P Chao, A Robey, E Dobriban, H Hassani… - arXiv preprint arXiv …, 2023 - arxiv.org

There is growing interest in ensuring that large language models (LLMs) align with human
values. However, the alignment of such models is vulnerable to adversarial jailbreaks, which …

被引用次数：407 相关文章所有 4 个版本

[PDF] neurips.cc

On the exploitability of instruction tuning

M Shu, J Wang, C Zhu, J Geiping… - Advances in Neural …, 2023 - proceedings.neurips.cc

Instruction tuning is an effective technique to align large language models (LLMs) with
human intent. In this work, we investigate how an adversary can exploit instruction tuning by …

被引用次数：85 相关文章所有 6 个版本

高级搜索

QQ 群