Delays in reinforcement learning

P Liotet - arXiv preprint arXiv:2309.11096, 2023 - arxiv.org
Delays are inherent to most dynamical systems. Besides shifting the process in time, they
can significantly affect their performance. For this reason, it is usually valuable to study the …

Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards

RC Broek, R Litjens, T Sagis, N Verbeeke… - … Symposium on Intelligent …, 2024 - Springer
Decision-making problems of sequential nature, where decisions made in the past may
have an impact on the future, are used to model many practically important applications. In …

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

RC Broek, R Litjens, T Sagis, L Siecker… - arXiv preprint arXiv …, 2022 - arxiv.org
We investigate the Multi-Armed Bandit problem with Temporally-Partitioned Rewards (TP-
MAB) setting in this paper. In the TP-MAB setting, an agent will receive subsets of the reward …

Pricing and advertising strategies in e-commerce scenarios

G Romano - 2022 - politesi.polimi.it
This thesis revolves around the problem of selling and advertising products on the Web and
exploits techniques from the fields of algorithmic game theory, mechanism design, and …

Stochastic linear bandits with global-local structure

FF Gonzales - 2021 - politesi.polimi.it
This work pertains to the field of Multi-Armed-Bandits (MAB), a framework in online learning
where an agent sequentially chooses from a set of available actions, called arms, and …