Weighted linear bandits for non-stationary environments

Y Russac, C Vernade, O Cappé - Advances in Neural …, 2019 - proceedings.neurips.cc
We consider a stochastic linear bandit model in which the available actions correspond to
arbitrary context vectors whose associated rewards follow a non-stationary linear regression …

Hedging the drift: Learning to optimize under nonstationarity

WC Cheung, D Simchi-Levi, R Zhu - Management Science, 2022 - pubsonline.informs.org
We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …

Continual learning as computationally constrained reinforcement learning

S Kumar, H Marklund, A Rao, Y Zhu, HJ Jeon… - arXiv preprint arXiv …, 2023 - arxiv.org
An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …

Module-wise adaptive distillation for multimodality foundation models

C Liang, J Yu, MH Yang, M Brown… - Advances in …, 2024 - proceedings.neurips.cc
Pre-trained multimodal foundation models have demonstrated remarkable generalizability
but pose challenges for deployment due to their large sizes. One effective approach to …

An information-theoretic analysis of nonstationary bandit learning

S Min, D Russo - International Conference on Machine …, 2023 - proceedings.mlr.press
In nonstationary bandit learning problems, the decision-maker must continually gather
information and adapt their action selection as the latent state of the environment evolves. In …

Non stationary multi-armed bandit: Empirical evaluation of a new concept drift-aware algorithm

E Cavenaghi, G Sottocornola, F Stella, M Zanker - Entropy, 2021 - mdpi.com
The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address
real-world challenges related to sequential decision making. In this setting, an agent selects …

[HTML][HTML] An empirical evaluation of active inference in multi-armed bandits

D Marković, H Stojić, S Schwöbel, SJ Kiebel - Neural Networks, 2021 - Elsevier
A key feature of sequential decision making under uncertainty is a need to balance between
exploiting—choosing the best action according to the current knowledge, and exploring …

Recovering bandits

C Pike-Burke, S Grunewalder - Advances in Neural …, 2019 - proceedings.neurips.cc
We study the recovering bandits problem, a variant of the stochastic multi-armed bandit
problem where the expected reward of each arm varies according to some unknown …

Nonstationary bandit learning via predictive sampling

Y Liu, B Van Roy, K Xu - International Conference on …, 2023 - proceedings.mlr.press
Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …

Autosem: Automatic task selection and mixing in multi-task learning

H Guo, R Pasunuru, M Bansal - arXiv preprint arXiv:1904.04153, 2019 - arxiv.org
Multi-task learning (MTL) has achieved success over a wide range of problems, where the
goal is to improve the performance of a primary task using a set of relevant auxiliary tasks …