Taming non-stationary bandits: A Bayesian approach

Y Russac, C Vernade, O Cappé - Advances in Neural …, 2019 - proceedings.neurips.cc

We consider a stochastic linear bandit model in which the available actions correspond to
arbitrary context vectors whose associated rewards follow a non-stationary linear regression …

被引用次数：129 相关文章所有 14 个版本

[PDF] arxiv.org

Hedging the drift: Learning to optimize under nonstationarity

WC Cheung, D Simchi-Levi, R Zhu - Management Science, 2022 - pubsonline.informs.org

We introduce data-driven decision-making algorithms that achieve state-of-the-art dynamic
regret bounds for a collection of nonstationary stochastic bandit settings. These settings …

被引用次数：112 相关文章所有 11 个版本

[PDF] arxiv.org

Continual learning as computationally constrained reinforcement learning

S Kumar, H Marklund, A Rao, Y Zhu, HJ Jeon… - arXiv preprint arXiv …, 2023 - arxiv.org

An agent that efficiently accumulates knowledge to develop increasingly sophisticated skills
over a long lifetime could advance the frontier of artificial intelligence capabilities. The …

被引用次数：13 相关文章所有 2 个版本

[PDF] neurips.cc

Module-wise adaptive distillation for multimodality foundation models

C Liang, J Yu, MH Yang, M Brown… - Advances in …, 2024 - proceedings.neurips.cc

Pre-trained multimodal foundation models have demonstrated remarkable generalizability
but pose challenges for deployment due to their large sizes. One effective approach to …

被引用次数：2 相关文章所有 6 个版本

[PDF] mlr.press

An information-theoretic analysis of nonstationary bandit learning

S Min, D Russo - International Conference on Machine …, 2023 - proceedings.mlr.press

In nonstationary bandit learning problems, the decision-maker must continually gather
information and adapt their action selection as the latent state of the environment evolves. In …

被引用次数：7 相关文章所有 6 个版本

[PDF] mdpi.com

Non stationary multi-armed bandit: Empirical evaluation of a new concept drift-aware algorithm

E Cavenaghi, G Sottocornola, F Stella, M Zanker - Entropy, 2021 - mdpi.com

The Multi-Armed Bandit (MAB) problem has been extensively studied in order to address
real-world challenges related to sequential decision making. In this setting, an agent selects …

被引用次数：36 相关文章所有 11 个版本

[HTML] sciencedirect.com

[HTML][HTML] An empirical evaluation of active inference in multi-armed bandits

D Marković, H Stojić, S Schwöbel, SJ Kiebel - Neural Networks, 2021 - Elsevier

A key feature of sequential decision making under uncertainty is a need to balance between
exploiting—choosing the best action according to the current knowledge, and exploring …

被引用次数：30 相关文章所有 9 个版本

[PDF] neurips.cc

Recovering bandits

C Pike-Burke, S Grunewalder - Advances in Neural …, 2019 - proceedings.neurips.cc

We study the recovering bandits problem, a variant of the stochastic multi-armed bandit
problem where the expected reward of each arm varies according to some unknown …

被引用次数：47 相关文章所有 10 个版本

[PDF] mlr.press

Nonstationary bandit learning via predictive sampling

Y Liu, B Van Roy, K Xu - International Conference on …, 2023 - proceedings.mlr.press

Thompson sampling has proven effective across a wide range of stationary bandit
environments. However, as we demonstrate in this paper, it can perform poorly when …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

Autosem: Automatic task selection and mixing in multi-task learning

H Guo, R Pasunuru, M Bansal - arXiv preprint arXiv:1904.04153, 2019 - arxiv.org

Multi-task learning (MTL) has achieved success over a wide range of problems, where the
goal is to improve the performance of a primary task using a set of relevant auxiliary tasks …

被引用次数：54 相关文章所有 3 个版本

高级搜索

QQ 群