Reinforcement learning from human feedback (RLHF) has emerged as the main paradigm for aligning large language models (LLMs) with human preferences. Typically, RLHF …
H Gweon, J Fan, B Kim - Philosophical Transactions of …, 2023 - royalsocietypublishing.org
A hallmark of human intelligence is the ability to understand and influence other minds. Humans engage in inferential social learning (ISL) by using commonsense psychology to …
Research in Fairness, Accountability, Transparency, and Ethics (FATE) 1 has established many sources and forms of algorithmic harm, in domains as diverse as health care, finance …
Games have a long history as benchmarks for progress in artificial intelligence. Approaches using search and learning produced strong performance across many perfect information …
Recent breakthroughs in large language models (LLMs) have brought remarkable success in the field of LLM-as-Agent. Nevertheless, a prevalent assumption is that the information …
We attack the state-of-the-art Go-playing AI system KataGo by training adversarial policies against it, achieving a $> $97% win rate against KataGo running at superhuman settings …
Deceptive agents are a challenge for the safety, trustworthiness, and cooperation of AI systems. We focus on the problem that agents might deceive in order to achieve their goals …
G Farina, C Pipis - Advances in Neural Information …, 2024 - proceedings.neurips.cc
No-regret learners seek to minimize the difference between the loss they cumulated through the actions they played, and the loss they would have cumulated in hindsight had they …
DJ Foster, N Golowich… - … Conference on Machine …, 2023 - proceedings.mlr.press
We consider the problem of decentralized multi-agent reinforcement learning in Markov games. A fundamental question is whether there exist algorithms that, when run …