[HTML][HTML] Theory of mind and preference learning at the interface of cognitive science, neuroscience, and AI: A review

C Langley, BI Cirstea, F Cuzzolin… - Frontiers in artificial …, 2022 - frontiersin.org
Theory of Mind (ToM)-the ability of the human mind to attribute mental states to others-is a
key component of human cognition. In order to understand other people's mental states or …

Open problems and fundamental limitations of reinforcement learning from human feedback

S Casper, X Davies, C Shi, TK Gilbert… - arXiv preprint arXiv …, 2023 - arxiv.org
Reinforcement learning from human feedback (RLHF) is a technique for training AI systems
to align with human goals. RLHF has emerged as the central method used to finetune state …

[图书][B] The alignment problem: How can machines learn human values?

B Christian - 2021 - books.google.com
'Vital reading. This is the book on artificial intelligence we need right now.'Mike Krieger,
cofounder of Instagram Artificial intelligence is rapidly dominating every aspect of our …

Consequences of misaligned AI

S Zhuang, D Hadfield-Menell - Advances in Neural …, 2020 - proceedings.neurips.cc
AI systems often rely on two key components: a specified goal or reward function and an
optimization algorithm to compute the optimal behavior for that goal. This approach is …

A policy gradient algorithm for learning to learn in multiagent reinforcement learning

DK Kim, M Liu, MD Riemer, C Sun… - International …, 2021 - proceedings.mlr.press
A fundamental challenge in multiagent reinforcement learning is to learn beneficial
behaviors in a shared environment with other simultaneously learning agents. In particular …

Human-Compatible Artificial Intelligence.

S Russell - 2022 - books.google.com
Artificial intelligence (AI) has as its aim the creation of intelligent machines. An entity is
considered to be intelligent, roughly speaking, if it chooses actions that are expected to …

A bayesian approach to robust inverse reinforcement learning

R Wei, S Zeng, C Li, A Garcia… - … on Robot Learning, 2023 - proceedings.mlr.press
We consider a Bayesian approach to offline model-based inverse reinforcement learning
(IRL). The proposed framework differs from existing offline model-based IRL approaches by …

Towards modeling and influencing the dynamics of human learning

R Tian, M Tomizuka, AD Dragan, A Bajcsy - Proceedings of the 2023 …, 2023 - dl.acm.org
Humans have internal models of robots (like their physical capabilities), the world (like what
will happen next), and their tasks (like a preferred goal). However, human internal models …

The history and future of AI

S Russell - Oxford Review of Economic Policy, 2021 - academic.oup.com
The standard model for developing AI systems assumes a fixed, known objective that the AI
system is required to optimize through its actions. Systems developed within the standard …

[PDF][PDF] Artificial Intelligence and the Problem of Control.

S Russell - Perspectives on Digital Humanism, 2022 - library.oapen.org
A long tradition in philosophy and economics equates intelligence with the ability to act
rationally—that is, to choose actions that can be expected to achieve one's objectives. This …