Foundational challenges in assuring alignment and safety of large language models

U Anwar, A Saparov, J Rando, D Paleka… - arXiv preprint arXiv …, 2024 - arxiv.org
This work identifies 18 foundational challenges in assuring the alignment and safety of large
language models (LLMs). These challenges are organized into three different categories …

Value fulcra: Mapping large language models to the multidimensional spectrum of basic human values

J Yao, X Yi, X Wang, Y Gong, X Xie - arXiv preprint arXiv:2311.10766, 2023 - arxiv.org
The rapid advancement of Large Language Models (LLMs) has attracted much attention to
value alignment for their responsible development. However, how to define values in this …

Resolving knowledge conflicts in large language models

Y Wang, S Feng, H Wang, W Shi… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) often encounter knowledge conflicts, scenarios where
discrepancy arises between the internal parametric knowledge of LLMs and non-parametric …

Sociocultural norm similarities and differences via situational alignment and explainable textual entailment

S CH-Wang, A Saakyan, O Li, Z Yu… - arXiv preprint arXiv …, 2023 - arxiv.org
Designing systems that can reason across cultures requires that they are grounded in the
norms of the contexts in which they operate. However, current research on developing …

Modular pluralism: Pluralistic alignment via multi-llm collaboration

S Feng, T Sorensen, Y Liu, J Fisher, CY Park… - arXiv preprint arXiv …, 2024 - arxiv.org
While existing alignment paradigms have been integral in developing large language
models (LLMs), LLMs often learn an averaged human preference and struggle to model …

Exploring Large Language Models' Cognitive Moral Development through Defining Issues Test

K Tanmay, A Khandelwal, U Agarwal… - arXiv preprint arXiv …, 2023 - arxiv.org
The development of large language models has instilled widespread interest among the
researchers to understand their inherent reasoning and problem-solving capabilities …

Designing Guiding Principles for NLP for Healthcare: A Case Study of Maternal Health

M Antoniak, A Naik, CS Alvarado, LL Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Objective: An ethical framework for the use of large language models (LLMs) is urgently
needed to shape how natural language processing (NLP) tools are used for healthcare …

Aligning to thousands of preferences via system message generalization

S Lee, SH Park, S Kim, M Seo - arXiv preprint arXiv:2405.17977, 2024 - arxiv.org
Although humans inherently have diverse values, current large language model (LLM)
alignment methods often assume that aligning LLMs with the general public's preferences is …

Dell: Generating reactions and explanations for llm-based misinformation detection

H Wan, S Feng, Z Tan, H Wang, Y Tsvetkov… - arXiv preprint arXiv …, 2024 - arxiv.org
Large language models are limited by challenges in factuality and hallucinations to be
directly employed off-the-shelf for judging the veracity of news articles, where factual …

Quantifying ai psychology: A psychometrics benchmark for large language models

Y Li, Y Huang, H Wang, X Zhang, J Zou… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have demonstrated exceptional task-solving capabilities,
increasingly adopting roles akin to human-like assistants. The broader integration of LLMs …