[图书][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Reinforcement learning in feature space: Matrix bandit, kernels, and regret bound

L Yang, M Wang - International Conference on Machine …, 2020 - proceedings.mlr.press
Exploration in reinforcement learning (RL) suffers from the curse of dimensionality when the
state-action space is large. A common practice is to parameterize the high-dimensional …

Logarithmic regret for reinforcement learning with linear function approximation

J He, D Zhou, Q Gu - International Conference on Machine …, 2021 - proceedings.mlr.press
Reinforcement learning (RL) with linear function approximation has received increasing
attention recently. However, existing work has focused on obtaining $\sqrt {T} $-type regret …

Provably efficient reinforcement learning for discounted mdps with feature mapping

D Zhou, J He, Q Gu - International Conference on Machine …, 2021 - proceedings.mlr.press
Modern tasks in reinforcement learning have large state and action spaces. To deal with
them efficiently, one often uses predefined feature mapping to represent states and actions …

Sample efficient reinforcement learning with REINFORCE

J Zhang, J Kim, B O'Donoghue, S Boyd - Proceedings of the AAAI …, 2021 - ojs.aaai.org
Policy gradient methods are among the most effective methods for large-scale reinforcement
learning, and their empirical success has prompted several works that develop the …

Adaptively tracking the best bandit arm with an unknown number of distribution changes

P Auer, P Gajane, R Ortner - Conference on Learning Theory, 2019 - proceedings.mlr.press
We consider the variant of the stochastic multi-armed bandit problem where the stochastic
reward distributions may change abruptly several times. In contrast to previous work, we are …

An optimal algorithm for stochastic and adversarial bandits

J Zimmert, Y Seldin - The 22nd International Conference on …, 2019 - proceedings.mlr.press
We derive an algorithm that achieves the optimal (up to constants) pseudo-regret in both
adversarial and stochastic multi-armed bandits without prior knowledge of the regime and …

Tsallis-inf: An optimal algorithm for stochastic and adversarial bandits

J Zimmert, Y Seldin - Journal of Machine Learning Research, 2021 - jmlr.org
We derive an algorithm that achieves the optimal (within constants) pseudo-regret in both
adversarial and stochastic multi-armed bandits without prior knowledge of the regime and …

Social learning in multi agent multi armed bandits

A Sankararaman, A Ganesh, S Shakkottai - Proceedings of the ACM on …, 2019 - dl.acm.org
Motivated by emerging need of learning algorithms for large scale networked and
decentralized systems, we introduce a distributed version of the classical stochastic Multi …

Constrained efficient global optimization of expensive black-box functions

W Xu, Y Jiang, B Svetozarevic… - … Conference on Machine …, 2023 - proceedings.mlr.press
We study the problem of constrained efficient global optimization, where both the objective
and constraints are expensive black-box functions that can be learned with Gaussian …