Gambler bandits and the regret of being ruined

FS Perotto, S Vakili, P Gajane, Y Faghan… - … on Autonomous Agents …, 2021 - hal.science
In this paper we consider a particular class of problems called multiarmed gambler bandits
(MAGB) which constitutes a modified version of the Bernoulli MAB problem where two new …

Learning intrinsically motivated options to stimulate policy exploration

L Bagot, K Mets, S Latré - 4th Lifelong Machine Learning Workshop …, 2020 - openreview.net
A Reinforcement Learning (RL) agent needs to find an optimal sequence of actions in order
to maximize rewards. This requires consistent exploration of states and action sequences to …

Mixed Time-Frame Training for Reinforcement Learning

G Senthilnathan - 2022 21st IEEE International Conference on …, 2022 - ieeexplore.ieee.org
Reinforcement learning typically only uses one type of environment during training: episodic
or non-episodic. In this paper, we propose a novel training technique, Mixed Time-Frame …

[PDF][PDF] HAL Id: hal-03120813 https://hal. archives-ouvertes. fr/hal-03120813

FS Perotto, S Vakili, P Gajane, Y Faghan, M Bourgais - academia.edu
In this paper we consider a particular class of problems called multiarmed gambler bandits
(MAGB) which constitutes a modified version of the Bernoulli MAB problem where two new …

Exploration et Exploitation dans des MDPs Cybernétiques

FS Perotto - 10èmes Journées Francophones sur la Planification, la …, 2015 - hal.science
Dans cet article on présente l'algorithme" average engaged climber"(AEC), une version
modifiée de la méthode d'itération sur les valeurs pour l'estimation de la fonction d'utilité …