The alignment problem from a deep learning perspective

R Ngo, L Chan, S Mindermann - arXiv preprint arXiv:2209.00626, 2022 - arxiv.org
In coming decades, artificial general intelligence (AGI) may surpass human capabilities at
many critical tasks. We argue that, without substantial effort to prevent it, AGIs could learn to …

Open problems in cooperative ai

A Dafoe, E Hughes, Y Bachrach, T Collins… - arXiv preprint arXiv …, 2020 - arxiv.org
Problems of cooperation--in which agents seek ways to jointly improve their welfare--are
ubiquitous and important. They can be found at scales ranging from our daily routines--such …

Foundations of cooperative AI

V Conitzer, C Oesterheld - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org
AI systems can interact in unexpected ways, sometimes with disastrous consequences. As
AI gets to control more of our world, these interactions will become more common and have …

AI research considerations for human existential safety (ARCHES)

A Critch, D Krueger - arXiv preprint arXiv:2006.04948, 2020 - arxiv.org
Framed in positive terms, this report examines how technical AI research might be steered in
a manner that is more attentive to humanity's long-term prospects for survival as a species …

Game theory with simulation of other players

V Kovarik, C Oesterheld, V Conitzer - arXiv preprint arXiv:2305.11261, 2023 - arxiv.org
Game-theoretic interactions with AI agents could differ from traditional human-human
interactions in various ways. One such difference is that it may be possible to simulate an AI …

Cooperative and uncooperative institution designs: Surprises and problems in open-source game theory

A Critch, M Dennis, S Russell - arXiv preprint arXiv:2208.07006, 2022 - arxiv.org
It is increasingly possible for real-world agents, such as software-based agents or human
institutions, to view the internal programming of other such agents that they interact with. For …

Similarity-based cooperative equilibrium

C Oesterheld, J Treutlein, RB Grosse… - Advances in …, 2024 - proceedings.neurips.cc
As machine learning agents act more autonomously in the world, they will increasingly
interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's …

[HTML][HTML] Cooperation, conflict, and transformative artificial intelligence: A research agenda

J Clifton - Effective Altruism Foundation, March, 2020 - longtermrisk.org
The Center on Long-Term Risk's research agenda on Cooperation, Conflict, and
Transformative Artificial Intelligence outlines what we think are the most promising avenues …

[PDF][PDF] Similarity-based cooperation

C Oesterheld, J Treutlein, R Grosse… - arXiv preprint arXiv …, 2022 - academia.edu
As machine learning agents act more autonomously in the world, they will increasingly
interact with each other. Unfortunately, in many social dilemmas like the one-shot Prisoner's …

White-box adversarial policies in deep reinforcement learning

S Casper, T Killian, G Kreiman… - arXiv preprint arXiv …, 2022 - arxiv.org
In reinforcement learning (RL), adversarial policies can be developed by training an
adversarial agent to minimize a target agent's rewards. Prior work has studied black-box …