A practical guide to multi-objective reinforcement learning and planning

CF Hayes, R Rădulescu, E Bargiacchi… - Autonomous Agents and …, 2022 - Springer
Real-world sequential decision-making tasks are generally complex, requiring trade-offs
between multiple, often conflicting, objectives. Despite this, the majority of research in …

Finite-time frequentist regret bounds of multi-agent thompson sampling on sparse hypergraphs

T Jin, HL Hsu, W Chang, P Xu - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored
into overlapping groups. Each group represents a hyperedge, forming a hypergraph over …

Context aware control systems: An engineering applications perspective

RAC Diaz, M Ghita, D Copot, IR Birs, C Muresan… - IEEE …, 2020 - ieeexplore.ieee.org
Cyber-physical systems revolve around context awareness, empowering objective-oriented
services, products and operations based on real data. Self-aware and self-control systems …

Statistical and computational trade-off in multi-agent multi-armed bandits

F Vannella, A Proutiere, J Jeong - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the problem of regret minimization in Multi-Agent Multi-Armed Bandits (MAMABs)
where the rewards are defined through a factor graph. We derive an instance-specific regret …

Multi-agent thompson sampling for bandit applications with sparse neighbourhood structures

T Verstraeten, E Bargiacchi, PJK Libin, J Helsen… - Scientific reports, 2020 - nature.com
Multi-agent coordination is prevalent in many real-world applications. However, such
coordination is challenging due to its combinatorial nature. An important observation in this …

Best arm identification in multi-agent multi-armed bandits

F Vannella, A Proutiere, J Jeong - … Conference on Machine …, 2023 - proceedings.mlr.press
We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits
(MAMABs) where the rewards are defined through a factor graph. The objective is to find an …

[PDF][PDF] Deep reinforcement learning for active wake control

G Neustroev, SPE Andringa, RA Verzijlbergh… - Proceedings of the 21st …, 2022 - ifaamas.org
Wind farms suffer from so-called wake effects: when turbines are located in the wind
shadows of other turbines, their power output is substantially reduced. These losses can be …

Budget allocation as a multi-agent system of contextual & continuous bandits

B Han, C Arndt - Proceedings of the 27th ACM SIGKDD Conference on …, 2021 - dl.acm.org
Budget allocation for online advertising suffers from multiple complications, including
significant delay between the initial ad impression to the call to action as well as cold-start …

AI-Toolbox: A C++ library for reinforcement learning and planning (with Python bindings)

E Bargiacchi, DM Roijers, A Nowé - Journal of Machine Learning Research, 2020 - jmlr.org
This paper describes AI-Toolbox, a C++ software library that contains reinforcement learning
and planning algorithms, and supports both single and multi agent problems, as well as …

[PDF][PDF] Cooperative Prioritized Sweeping.

E Bargiacchi, T Verstraeten, DM Roijers - AAMAS, 2021 - cris.vub.be
We present a novel model-based algorithm, Cooperative Prioritized Sweeping, for sample-
efficient learning in large multi-agent Markov decision processes. Our approach leverages …