Ai alignment: A comprehensive survey

C Chen, K Shu - arXiv preprint arXiv:2311.05656, 2023 - arxiv.org

Misinformation such as fake news and rumors is a serious threat on information ecosystems
and public trust. The emergence of Large Language Models (LLMs) has great potential to …

被引用次数：48 相关文章所有 4 个版本

[PDF] arxiv.org

Safe rlhf: Safe reinforcement learning from human feedback

J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

With the development of large language models (LLMs), striking a balance between the
performance and safety of AI systems has never been more critical. However, the inherent …

被引用次数：82 相关文章所有 3 个版本

[PDF] arxiv.org

The alignment problem from a deep learning perspective

R Ngo, L Chan, S Mindermann - arXiv preprint arXiv:2209.00626, 2022 - arxiv.org

In coming decades, artificial general intelligence (AGI) may surpass human capabilities at
many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to …

被引用次数：125 相关文章所有 4 个版本

[PDF] arxiv.org

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org

This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

被引用次数：37 相关文章所有 4 个版本

[PDF] arxiv.org

Os-copilot: Towards generalist computer agents with self-improvement

Z Wu, C Han, Z Ding, Z Weng, Z Liu, S Yao… - arXiv preprint arXiv …, 2024 - arxiv.org

Autonomous interaction with the computer has been a longstanding challenge with great
potential, and the recent proliferation of large language models (LLMs) has markedly …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Aligner: Achieving efficient alignment through weak-to-strong correction

J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan… - arXiv preprint arXiv …, 2024 - arxiv.org

Efforts to align Large Language Models (LLMs) are mainly conducted via Reinforcement
Learning from Human Feedback (RLHF) methods. However, RLHF encounters major …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Culturellm: Incorporating cultural differences into large language models

C Li, M Chen, J Wang, S Sitaram, X Xie - arXiv preprint arXiv:2402.10946, 2024 - arxiv.org

Large language models (LLMs) are reported to be partial to certain cultures owing to the
training data dominance from the English corpora. Since multilingual cultural data are often …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Data augmentation using llms: Data perspectives, learning paradigms and challenges

B Ding, C Qin, R Zhao, T Luo, X Li, G Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

In the rapidly evolving field of machine learning (ML), data augmentation (DA) has emerged
as a pivotal technique for enhancing model performance by diversifying training examples …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Can Large Language Model Agents Simulate Human Trust Behaviors?

C Xie, C Chen, F Jia, Z Ye, K Shu, A Bibi, Z Hu… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Model (LLM) agents have been increasingly adopted as simulation tools to
model humans in applications such as social science. However, one fundamental question …

被引用次数：11 相关文章所有 7 个版本

高级搜索

QQ 群