At the tensions of south and north: Critical roles of global south stakeholders in AI governance

MT Png - Proceedings of the 2022 ACM Conference on Fairness …, 2022 - dl.acm.org
This paper aims to present a landscape of AI governance for and from the Global South,
advanced by critical and decolonial-informed practitioners and scholars, and contrast this …

Salmon: Self-alignment with principle-following reward models

Z Sun, Y Shen, H Zhang, Q Zhou, Z Chen… - arXiv preprint arXiv …, 2023 - arxiv.org
Supervised Fine-Tuning (SFT) on response demonstrations combined with Reinforcement
Learning from Human Feedback (RLHF) constitutes a powerful paradigm for aligning LLM …

A survey of reinforcement learning from human feedback

T Kaufmann, P Weng, V Bengs… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a variant of reinforcement learning
(RL) that learns from human feedback instead of relying on an engineered reward function …

Alignment with human representations supports robust few-shot learning

I Sucholutsky, T Griffiths - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Should we care whether AI systems have representations of the world that are similar to
those of humans? We provide an information-theoretic analysis that suggests that there …

Building human values into recommender systems: An interdisciplinary synthesis

J Stray, A Halevy, P Assar, D Hadfield-Menell… - ACM Transactions on …, 2024 - dl.acm.org
Recommender systems are the algorithms which select, filter, and personalize content
across many of the world's largest platforms and apps. As such, their positive and negative …

Rethinking interpretability in the era of large language models

C Singh, JP Inala, M Galley, R Caruana… - arXiv preprint arXiv …, 2024 - arxiv.org
Interpretable machine learning has exploded as an area of interest over the last decade,
sparked by the rise of increasingly large datasets and deep neural networks …

Implementations in machine ethics: A survey

S Tolmeijer, M Kneer, C Sarasua, M Christen… - ACM Computing …, 2020 - dl.acm.org
Increasingly complex and autonomous systems require machine ethics to maximize the
benefits and minimize the risks to society arising from the new technology. It is challenging …

From Instructions to Intrinsic Human Values--A Survey of Alignment Goals for Big Models

J Yao, X Yi, X Wang, J Wang, X Xie - arXiv preprint arXiv:2308.12014, 2023 - arxiv.org
Big models, exemplified by Large Language Models (LLMs), are models typically pre-
trained on massive data and comprised of enormous parameters, which not only obtain …

[图书][B] Why machines will never rule the world: artificial intelligence without fear

J Landgrebe, B Smith - 2022 - taylorfrancis.com
The book's core argument is that an artificial intelligence that could equal or exceed human
intelligence—sometimes called artificial general intelligence (AGI)—is for mathematical …

Getting aligned on representational alignment

I Sucholutsky, L Muttenthaler, A Weller, A Peng… - arXiv preprint arXiv …, 2023 - arxiv.org
Biological and artificial information processing systems form representations of the world
that they can use to categorize, reason, plan, navigate, and make decisions. To what extent …