Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …

被引用次数：604 相关文章所有 2 个版本

[PDF] arxiv.org

A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org

Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

被引用次数：365 相关文章所有 3 个版本

[PDF] jmlr.org

Scaling instruction-finetuned language models

HW Chung, L Hou, S Longpre, B Zoph, Y Tay… - Journal of Machine …, 2024 - jmlr.org

Finetuning language models on a collection of datasets phrased as instructions has been
shown to improve model performance and generalization to unseen tasks. In this paper we …

被引用次数：2502 相关文章所有 3 个版本

[PDF] arxiv.org

Llama 2: Open foundation and fine-tuned chat models

H Touvron, L Martin, K Stone, P Albert… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we develop and release Llama 2, a collection of pretrained and fine-tuned large
language models (LLMs) ranging in scale from 7 billion to 70 billion parameters. Our fine …

被引用次数：8266 相关文章所有 2 个版本

[PDF] arxiv.org

Gpt-4 technical report

J Achiam, S Adler, S Agarwal, L Ahmad… - arXiv preprint arXiv …, 2023 - arxiv.org

We report the development of GPT-4, a large-scale, multimodal model which can accept
image and text inputs and produce text outputs. While less capable than humans in many …

被引用次数：3852 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：2407 相关文章所有 4 个版本

[PDF] arxiv.org

Palm 2 technical report

R Anil, AM Dai, O Firat, M Johnson, D Lepikhin… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and
reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is …

被引用次数：1218 相关文章所有 2 个版本

[PDF] mlr.press

The flan collection: Designing data and methods for effective instruction tuning

S Longpre, L Hou, T Vu, A Webson… - International …, 2023 - proceedings.mlr.press

We study the design decision of publicly available instruction tuning methods, by
reproducing and breaking down the development of Flan 2022 (Chung et al., 2022) …

被引用次数：510 相关文章所有 8 个版本

[PDF] neurips.cc

Jailbroken: How does llm safety training fail?

A Wei, N Haghtalab… - Advances in Neural …, 2024 - proceedings.neurips.cc

Large language models trained for safety and harmlessness remain susceptible to
adversarial misuse, as evidenced by the prevalence of “jailbreak” attacks on early releases …

被引用次数：483 相关文章所有 8 个版本

[PDF] neurips.cc

Principle-driven self-alignment of language models from scratch with minimal human supervision

Z Sun, Y Shen, Q Zhou, H Zhang… - Advances in …, 2024 - proceedings.neurips.cc

Recent AI-assistant agents, such as ChatGPT, predominantly rely on supervised fine-tuning
(SFT) with human annotations and reinforcement learning from human feedback (RLHF) to …

被引用次数：244 相关文章所有 8 个版本

高级搜索

QQ 群