Ai alignment: A comprehensive survey

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

Evaluating the moral beliefs encoded in llms

N Scherrer, C Shi, A Feder… - Advances in Neural …, 2024 - proceedings.neurips.cc
This paper presents a case study on the design, administration, post-processing, and
evaluation of surveys on large language models (LLMs). It comprises two components:(1) A …

Bridging the gap: A survey on integrating (human) feedback for natural language generation

P Fernandes, A Madaan, E Liu, A Farinhas… - Transactions of the …, 2023 - direct.mit.edu
Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

HR Kirk, B Vidgen, P Röttger, SA Hale - arXiv preprint arXiv:2303.05453, 2023 - arxiv.org
Large language models (LLMs) are used to generate content for a wide range of tasks, and
are set to reach a growing audience in coming years due to integration in product interfaces …

Socially intelligent machines that learn from humans and help humans learn

H Gweon, J Fan, B Kim - Philosophical Transactions of …, 2023 - royalsocietypublishing.org
A hallmark of human intelligence is the ability to understand and influence other minds.
Humans engage in inferential social learning (ISL) by using commonsense psychology to …

NLPositionality: Characterizing design biases of datasets and models

S Santy, JT Liang, RL Bras, K Reinecke… - arXiv preprint arXiv …, 2023 - arxiv.org
Design biases in NLP systems, such as performance differences for different populations,
often stem from their creator's positionality, ie, views and lived experiences shaped by …

Unveiling the implicit toxicity in large language models

J Wen, P Ke, H Sun, Z Zhang, C Li, J Bai… - arXiv preprint arXiv …, 2023 - arxiv.org
The open-endedness of large language models (LLMs) combined with their impressive
capabilities may lead to new safety issues when being exploited for malicious use. While …

Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties

T Sorensen, L Jiang, JD Hwang, S Levine… - Proceedings of the …, 2024 - ojs.aaai.org
Human values are crucial to human decision-making.\textit {Value pluralism} is the view that
multiple correct values may be held in tension with one another (eg, when considering\textit …

From Instructions to Intrinsic Human Values--A Survey of Alignment Goals for Big Models

J Yao, X Yi, X Wang, J Wang, X Xie - arXiv preprint arXiv:2308.12014, 2023 - arxiv.org
Big models, exemplified by Large Language Models (LLMs), are models typically pre-
trained on massive data and comprised of enormous parameters, which not only obtain …

In-context learning with retrieved demonstrations for language models: A survey

X Xu, Y Liu, P Pasupat, M Kazemi - arXiv preprint arXiv:2401.11624, 2024 - arxiv.org
Language models, especially pre-trained large language models, have showcased
remarkable abilities as few-shot in-context learners (ICL), adept at adapting to new tasks …