Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - arXiv preprint arXiv:2311.05656, 2023 - arxiv.org
Misinformation such as fake news and rumors is a serious threat on information ecosystems
and public trust. The emergence of Large Language Models (LLMs) has great potential to …

Safe rlhf: Safe reinforcement learning from human feedback

J Dai, X Pan, R Sun, J Ji, X Xu, M Liu, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
With the development of large language models (LLMs), striking a balance between the
performance and safety of AI systems has never been more critical. However, the inherent …

The alignment problem from a deep learning perspective

R Ngo, L Chan, S Mindermann - arXiv preprint arXiv:2209.00626, 2022 - arxiv.org
In coming decades, artificial general intelligence (AGI) may surpass human capabilities at
many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to …

Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Os-copilot: Towards generalist computer agents with self-improvement

Z Wu, C Han, Z Ding, Z Weng, Z Liu, S Yao… - arXiv preprint arXiv …, 2024 - arxiv.org
Autonomous interaction with the computer has been a longstanding challenge with great
potential, and the recent proliferation of large language models (LLMs) has markedly …

Aligner: Achieving efficient alignment through weak-to-strong correction

J Ji, B Chen, H Lou, D Hong, B Zhang, X Pan… - arXiv preprint arXiv …, 2024 - arxiv.org
Efforts to align Large Language Models (LLMs) are mainly conducted via Reinforcement
Learning from Human Feedback (RLHF) methods. However, RLHF encounters major …

Culturellm: Incorporating cultural differences into large language models

C Li, M Chen, J Wang, S Sitaram, X Xie - arXiv preprint arXiv:2402.10946, 2024 - arxiv.org
Large language models (LLMs) are reported to be partial to certain cultures owing to the
training data dominance from the English corpora. Since multilingual cultural data are often …

Data augmentation using llms: Data perspectives, learning paradigms and challenges

B Ding, C Qin, R Zhao, T Luo, X Li, G Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
In the rapidly evolving field of machine learning (ML), data augmentation (DA) has emerged
as a pivotal technique for enhancing model performance by diversifying training examples …

Can Large Language Model Agents Simulate Human Trust Behaviors?

C Xie, C Chen, F Jia, Z Ye, K Shu, A Bibi, Z Hu… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Model (LLM) agents have been increasingly adopted as simulation tools to
model humans in applications such as social science. However, one fundamental question …