Can machines learn morality? the delphi experiment

J Ji, T Qiu, B Chen, B Zhang, H Lou, K Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

AI alignment aims to make AI systems behave in line with human intentions and values. As
AI systems grow more capable, the potential large-scale risks associated with misaligned AI …

被引用次数：125 相关文章所有 3 个版本

[PDF] neurips.cc

Evaluating the moral beliefs encoded in llms

N Scherrer, C Shi, A Feder… - Advances in Neural …, 2024 - proceedings.neurips.cc

This paper presents a case study on the design, administration, post-processing, and
evaluation of surveys on large language models (LLMs). It comprises two components:(1) A …

被引用次数：56 相关文章所有 6 个版本

[PDF] mit.edu

Bridging the gap: A survey on integrating (human) feedback for natural language generation

P Fernandes, A Madaan, E Liu, A Farinhas… - Transactions of the …, 2023 - direct.mit.edu

Natural language generation has witnessed significant advancements due to the training of
large language models on vast internet-scale datasets. Despite these advancements, there …

被引用次数：59 相关文章所有 9 个版本

[PDF] arxiv.org

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

HR Kirk, B Vidgen, P Röttger, SA Hale - arXiv preprint arXiv:2303.05453, 2023 - arxiv.org

Large language models (LLMs) are used to generate content for a wide range of tasks, and
are set to reach a growing audience in coming years due to integration in product interfaces …

被引用次数：73 相关文章所有 2 个版本

[PDF] stanford.edu

Socially intelligent machines that learn from humans and help humans learn

H Gweon, J Fan, B Kim - Philosophical Transactions of …, 2023 - royalsocietypublishing.org

A hallmark of human intelligence is the ability to understand and influence other minds.
Humans engage in inferential social learning (ISL) by using commonsense psychology to …

被引用次数：10 相关文章所有 6 个版本

[PDF] arxiv.org

NLPositionality: Characterizing design biases of datasets and models

S Santy, JT Liang, RL Bras, K Reinecke… - arXiv preprint arXiv …, 2023 - arxiv.org

Design biases in NLP systems, such as performance differences for different populations,
often stem from their creator's positionality, ie, views and lived experiences shaped by …

被引用次数：52 相关文章所有 9 个版本

[PDF] arxiv.org

Unveiling the implicit toxicity in large language models

J Wen, P Ke, H Sun, Z Zhang, C Li, J Bai… - arXiv preprint arXiv …, 2023 - arxiv.org

The open-endedness of large language models (LLMs) combined with their impressive
capabilities may lead to new safety issues when being exploited for malicious use. While …

被引用次数：32 相关文章所有 4 个版本

[PDF] aaai.org

Value kaleidoscope: Engaging ai with pluralistic human values, rights, and duties

T Sorensen, L Jiang, JD Hwang, S Levine… - Proceedings of the …, 2024 - ojs.aaai.org

Human values are crucial to human decision-making.\textit {Value pluralism} is the view that
multiple correct values may be held in tension with one another (eg, when considering\textit …

被引用次数：26 相关文章所有 3 个版本

[PDF] arxiv.org

From Instructions to Intrinsic Human Values--A Survey of Alignment Goals for Big Models

J Yao, X Yi, X Wang, J Wang, X Xie - arXiv preprint arXiv:2308.12014, 2023 - arxiv.org

Big models, exemplified by Large Language Models (LLMs), are models typically pre-
trained on massive data and comprised of enormous parameters, which not only obtain …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

In-context learning with retrieved demonstrations for language models: A survey

X Xu, Y Liu, P Pasupat, M Kazemi - arXiv preprint arXiv:2401.11624, 2024 - arxiv.org

Language models, especially pre-trained large language models, have showcased
remarkable abilities as few-shot in-context learners (ICL), adept at adapting to new tasks …

被引用次数：15 相关文章所有 2 个版本

高级搜索

QQ 群