Artificial intelligence, values, and alignment

MT Png - Proceedings of the 2022 ACM Conference on Fairness …, 2022 - dl.acm.org

This paper aims to present a landscape of AI governance for and from the Global South,
advanced by critical and decolonial-informed practitioners and scholars, and contrast this …

被引用次数：45 相关文章所有 4 个版本

[PDF] arxiv.org

Salmon: Self-alignment with principle-following reward models

Z Sun, Y Shen, H Zhang, Q Zhou, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org

Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement
Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM …

被引用次数：29 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org

Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

被引用次数：37 相关文章所有 4 个版本

[PDF] neurips.cc

Alignment with human representations supports robust few-shot learning

I Sucholutsky, T Griffiths - Advances in Neural Information …, 2024 - proceedings.neurips.cc

Should we care whether AI systems have representations of the world that are similar to
those of humans? We provide an information-theoretic analysis that suggests that there …

被引用次数：22 相关文章所有 6 个版本

[PDF] acm.org

Building human values into recommender systems: An interdisciplinary synthesis

J Stray, A Halevy, P Assar, D Hadfield-Menell… - ACM Transactions on …, 2024 - dl.acm.org

Recommender systems are the algorithms which select, filter, and personalize content
across many of the world's largest platforms and apps. As such, their positive and negative …

被引用次数：42 相关文章所有 8 个版本

[PDF] arxiv.org

Rethinking interpretability in the era of large language models

C Singh, JP Inala, M Galley, R Caruana… - arXiv preprint arXiv …, 2024 - arxiv.org

Interpretable machine learning has exploded as an area of interest over the last decade,
sparked by the rise of increasingly large datasets and deep neural networks …

被引用次数：25 相关文章所有 2 个版本

[PDF] arxiv.org

Implementations in machine ethics: A survey

S Tolmeijer, M Kneer, C Sarasua, M Christen… - ACM Computing …, 2020 - dl.acm.org

Increasingly complex and autonomous systems require machine ethics to maximize the
benefits and minimize the risks to society arising from the new technology. It is challenging …

被引用次数：149 相关文章所有 12 个版本

[PDF] arxiv.org

From Instructions to Intrinsic Human Values--A Survey of Alignment Goals for Big Models

J Yao, X Yi, X Wang, J Wang, X Xie - arXiv preprint arXiv:2308.12014, 2023 - arxiv.org

Big models, exemplified by Large Language Models (LLMs), are models typically pre-
trained on massive data and comprised of enormous parameters, which not only obtain …

被引用次数：22 相关文章所有 2 个版本

[PDF] philpapers.org

[图书][B] Why machines will never rule the world: artificial intelligence without fear

J Landgrebe, B Smith - 2022 - taylorfrancis.com

The book's core argument is that an artificial intelligence that could equal or exceed human
intelligence—sometimes called artificial general intelligence (AGI)—is for mathematical …

被引用次数：62 相关文章所有 8 个版本

[PDF] arxiv.org

Getting aligned on representational alignment

I Sucholutsky, L Muttenthaler, A Weller, A Peng… - arXiv preprint arXiv …, 2023 - arxiv.org

Biological and artificial information processing systems form representations of the world
that they can use to categorize, reason, plan, navigate, and make decisions. To what extent …

被引用次数：23 相关文章所有 2 个版本

高级搜索

QQ 群