Combinatorial multi-armed bandit: General framework and applications

W Chen, Y Wang, Y Yuan - International conference on …, 2013 - proceedings.mlr.press
We define a general framework for a large class of combinatorial multi-armed bandit (CMAB)
problems, where simple arms with unknown istributions form\em super arms. In each round …

Combinatorial multi-armed bandit and its extension to probabilistically triggered arms

W Chen, Y Wang, Y Yuan, Q Wang - Journal of Machine Learning …, 2016 - jmlr.org
In the past few years, differential privacy has become a standard concept in the area of
privacy. One of the most important problems in this field is to answer queries while …

Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing

X Chen, Q Lin, D Zhou - International conference on …, 2013 - proceedings.mlr.press
In real crowdsourcing applications, each label from a crowd usually comes with a certain
cost. Given a pre-fixed amount of budget, since different tasks have different ambiguities and …

Deterministic sequencing of exploration and exploitation for multi-armed bandit problems

S Vakili, K Liu, Q Zhao - IEEE Journal of Selected Topics in …, 2013 - ieeexplore.ieee.org
In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward
models. At each time, a player selects one arm to play, aiming to maximize the total …

Reinforcement learning based stochastic shortest path finding in wireless sensor networks

W Xia, C Di, H Guo, S Li - Ieee Access, 2019 - ieeexplore.ieee.org
Many factors influence the connection states between nodes of wireless sensor networks,
such as physical distance, and the network load, making the network's edge length dynamic …

Statistical efficiency of thompson sampling for combinatorial semi-bandits

P Perrault, E Boursier, M Valko… - Advances in Neural …, 2020 - proceedings.neurips.cc
We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback
(CMAB). In CMAB, the question of the existence of an efficient policy with an optimal …

Stochastic online shortest path routing: The value of feedback

MS Talebi, Z Zou, R Combes… - … on Automatic Control, 2017 - ieeexplore.ieee.org
This paper studies online shortest path routing over multihop networks. Link costs or delays
are time varying and modeled by independent and identically distributed random processes …

Matching while learning

R Johari, V Kamble, Y Kanoria - Operations Research, 2021 - pubsonline.informs.org
We consider the problem faced by a service platform that needs to match limited supply with
demand while learning the attributes of new users to match them better in the future. We …

[HTML][HTML] Online learning of energy consumption for navigation of electric vehicles

N Åkerblom, Y Chen, MH Chehreghani - Artificial Intelligence, 2023 - Elsevier
Energy efficient navigation constitutes an important challenge in electric vehicles, due to
their limited battery capacity. We employ a Bayesian approach to model the energy …

No-regret algorithms for heavy-tailed linear bandits

AM Medina, S Yang - International Conference on Machine …, 2016 - proceedings.mlr.press
We analyze the problem of linear bandits under heavy tailed noise. Most of of the work on
linear bandits has been based on the assumption of bounded or sub-Gaussian noise …