Should robots be obedient?

J Leike, D Krueger, T Everitt, M Martic, V Maini… - arXiv preprint arXiv …, 2018 - arxiv.org

One obstacle to applying reinforcement learning algorithms to real-world problems is the
lack of suitable reward functions. Designing such reward functions is difficult in part because …

被引用次数：343 相关文章所有 6 个版本

[PDF] arxiv.org

AGI safety literature review

T Everitt, G Lea, M Hutter - arXiv preprint arXiv:1805.01109, 2018 - arxiv.org

The development of Artificial General Intelligence (AGI) promises to be a major event. Along
with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The …

被引用次数：158 相关文章所有 8 个版本

[PDF] mlr.press

Machine theory of mind

N Rabinowitz, F Perbet, F Song… - International …, 2018 - proceedings.mlr.press

Abstract Theory of mind (ToM) broadly refers to humans' ability to represent the mental
states of others, including their desires, beliefs, and intentions. We design a Theory of Mind …

被引用次数：655 相关文章所有 8 个版本

[PDF] arxiv.org

AI safety gridworlds

J Leike, M Martic, V Krakovna, PA Ortega… - arXiv preprint arXiv …, 2017 - arxiv.org

We present a suite of reinforcement learning environments illustrating various safety
properties of intelligent agents. These problems include safe interruptibility, avoiding side …

被引用次数：352 相关文章所有 3 个版本

Trustworthy ai

R Chatila, V Dignum, M Fisher, F Giannotti… - Reflections on artificial …, 2021 - Springer

Modern AI systems have become of widespread use in almost all sectors with a strong
impact on our society. However, the very methods on which they rely, based on Machine …

被引用次数：112 相关文章所有 6 个版本

[PDF] mlr.press

Safe imitation learning via fast bayesian reward inference from preferences

D Brown, R Coleman, R Srinivasan… - … on Machine Learning, 2020 - proceedings.mlr.press

Bayesian reward learning from demonstrations enables rigorous safety and uncertainty
analysis when performing imitation learning. However, Bayesian reward learning methods …

被引用次数：129 相关文章所有 10 个版本

[HTML] sciencedirect.com

[HTML][HTML] Hard choices in artificial intelligence

R Dobbe, TK Gilbert, Y Mintz - Artificial Intelligence, 2021 - Elsevier

As AI systems are integrated into high stakes social domains, researchers now examine how
to design and operate them in a safe and ethical manner. However, the criteria for identifying …

被引用次数：81 相关文章所有 11 个版本

[PDF] aaai.org Full View

Advanced artificial agents intervene in the provision of reward

M Cohen, M Hutter, M Osborne - AI magazine, 2022 - ojs.aaai.org

To analyze the expected behavior of advanced artificial agents, we consider a formal
idealized agent that makes observations that inform it about its goal, and we find that it can …

被引用次数：48 相关文章所有 9 个版本

[HTML] springer.com

[HTML][HTML] Reward tampering problems and solutions in reinforcement learning: A causal influence diagram perspective

T Everitt, M Hutter, R Kumar, V Krakovna - Synthese, 2021 - Springer

Can humans get arbitrarily capable reinforcement learning (RL) agents to do their bidding?
Or will sufficiently capable RL agents always find ways to bypass their intended objectives …

被引用次数：98 相关文章所有 11 个版本

[PDF] neurips.cc

Occam's razor is insufficient to infer the preferences of irrational agents

S Armstrong, S Mindermann - Advances in neural …, 2018 - proceedings.neurips.cc

Inverse reinforcement learning (IRL) attempts to infer human rewards or preferences from
observed behavior. Since human planning systematically deviates from rationality, several …

被引用次数：119 相关文章所有 8 个版本

高级搜索

QQ 群

Scalable agent alignment via reward modeling: a research direction