What doubling tricks can and can't do for multi-armed bandits

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

被引用次数：3253 相关文章所有 9 个版本

[PDF] mlr.press

Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound

L Yang, M Wang - International Conference on Machine …, 2020 - proceedings.mlr.press

Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the
state-action space is large. A common practice is to parameterize the high-dimensional …

被引用次数：336 相关文章所有 6 个版本

[PDF] mlr.press

Logarithmic regret for reinforcement learning with linear function approximation

J He, D Zhou, Q Gu - International Conference on Machine …, 2021 - proceedings.mlr.press

Reinforcement learning (RL) with linear function approximation has received increasing
attention recently. However, existing work has focused on obtaining $\sqrt {T} $-type regret …

被引用次数：108 相关文章所有 5 个版本

[PDF] mlr.press

Provably efficient reinforcement learning for discounted mdps with feature mapping

D Zhou, J He, Q Gu - International Conference on Machine …, 2021 - proceedings.mlr.press

Modern tasks in reinforcement learning have large state and action spaces. To deal with
them efficiently, one often uses predefined feature mapping to represent states and actions …

被引用次数：148 相关文章所有 5 个版本

[PDF] aaai.org

Sample efficient reinforcement learning with REINFORCE

J Zhang, J Kim, B O'Donoghue, S Boyd - Proceedings of the AAAI …, 2021 - ojs.aaai.org

Policy gradient methods are among the most effective methods for large-scale reinforcement
learning, and their empirical success has prompted several works that develop the …

被引用次数：115 相关文章所有 10 个版本

[PDF] mlr.press

Adaptively tracking the best bandit arm with an unknown number of distribution changes

P Auer, P Gajane, R Ortner - Conference on Learning Theory, 2019 - proceedings.mlr.press

We consider the variant of the stochastic multi-armed bandit problem where the stochastic
reward distributions may change abruptly several times. In contrast to previous work, we are …

被引用次数：150 相关文章所有 6 个版本

[PDF] mlr.press

An optimal algorithm for stochastic and adversarial bandits

J Zimmert, Y Seldin - The 22nd International Conference on …, 2019 - proceedings.mlr.press

We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both
adversarial and stochastic multi-armed bandits without prior knowledge of the regime and …

被引用次数：129 相关文章所有 4 个版本

[PDF] jmlr.org

Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits

J Zimmert, Y Seldin - Journal of Machine Learning Research, 2021 - jmlr.org

We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both
adversarial and stochastic multi-armed bandits without prior knowledge of the regime and …

被引用次数：129 相关文章所有 6 个版本

[PDF] acm.org

Social learning in multi agent multi armed bandits

A Sankararaman, A Ganesh, S Shakkottai - Proceedings of the ACM on …, 2019 - dl.acm.org

Motivated by emerging need of learning algorithms for large scale networked and
decentralized systems, we introduce a distributed version of the classical stochastic Multi …

被引用次数：101 相关文章所有 10 个版本

[PDF] mlr.press

Constrained efficient global optimization of expensive black-box functions

W Xu, Y Jiang, B Svetozarevic… - … Conference on Machine …, 2023 - proceedings.mlr.press

We study the problem of constrained efficient global optimization, where both the objective
and constraints are expensive black-box functions that can be learned with Gaussian …

被引用次数：32 相关文章所有 10 个版本

高级搜索

QQ 群