Uncertainty and exploration in a restless bandit problem

M Speekenbrink, E Konstantinidis - Topics in cognitive science, 2015 - Wiley Online Library
Decision making in noisy and changing environments requires a fine balance between
exploiting knowledge about good courses of action and exploring the environment in order …

A survey of online experiment design with the stochastic multi-armed bandit

G Burtini, J Loeppky, R Lawrence - arXiv preprint arXiv:1510.00757, 2015 - arxiv.org
Adaptive and sequential experiment design is a well-studied area in numerous domains. We
survey and synthesize the work of the online statistical learning paradigm referred to as multi …

Sliding-window thompson sampling for non-stationary settings

F Trovo, S Paladino, M Restelli, N Gatti - Journal of Artificial Intelligence …, 2020 - jair.org
Abstract Multi-Armed Bandit (MAB) techniques have been successfully applied to many
classes of sequential decision problems in the past decades. However, non-stationary …

Thompson sampling for dynamic multi-armed bandits

N Gupta, OC Granmo… - 2011 10th International …, 2011 - ieeexplore.ieee.org
The importance of multi-armed bandit (MAB) problems is on the rise due to their recent
application in a large variety of areas such as online advertising, news article selection …

Adaptive experimental design: Prospects and applications in political science

M Offer‐Westort, A Coppock… - American Journal of …, 2021 - Wiley Online Library
Experimental researchers in political science frequently face the problem of inferring which
of several treatment arms is most effective. They may also seek to estimate mean outcomes …

Motion planning as online learning: A multi-armed bandit approach to kinodynamic sampling-based planning

M Faroni, D Berenson - IEEE Robotics and Automation Letters, 2023 - ieeexplore.ieee.org
Kinodynamic motion planners allow robots to perform complex manipulation tasks under
dynamics constraints or with black-box models. However, they struggle to find high-quality …

Estimating model utility for deformable object manipulation using multiarmed bandit methods

D McConachie, D Berenson - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
We present a novel approach to deformable object manipulation that does not rely on highly
accurate modeling. The key contribution of this paper is to formulate the task as a …

The hierarchical continuous pursuit learning automation: a novel scheme for environments with large numbers of actions

A Yazidi, X Zhang, L Jiao… - IEEE transactions on …, 2019 - ieeexplore.ieee.org
Although the field of learning automata (LA) has made significant progress in the past four
decades, the LA-based methods to tackle problems involving environments with a large …

[HTML][HTML] Multi-armed bandit problem with online clustering as side information

A Dzhoha, I Rozora - Journal of Computational and Applied Mathematics, 2023 - Elsevier
We consider the sequential resource allocation problem under the multi-armed bandit model
in the non-stationary stochastic environment. Motivated by many real applications, where …

[HTML][HTML] Adversarial autoencoder and multi-armed bandit for dynamic difficulty adjustment in immersive virtual reality for rehabilitation: Application to hand movement

K Kamikokuryo, T Haga, G Venture, V Hernandez - Sensors, 2022 - mdpi.com
Motor rehabilitation is used to improve motor control skills to improve the patient's quality of
life. Regular adjustments based on the effect of therapy are necessary, but this can be time …