Adaptive shortest-path routing under unknown and stochastically varying link states

W Chen, Y Wang, Y Yuan - International conference on …, 2013 - proceedings.mlr.press

We define a general framework for a large class of combinatorial multi-armed bandit (CMAB)
problems, where simple arms with unknown istributions form\em super arms. In each round …

被引用次数：877 相关文章所有 13 个版本

[PDF] jmlr.org

Combinatorial multi-armed bandit and its extension to probabilistically triggered arms

W Chen, Y Wang, Y Yuan, Q Wang - Journal of Machine Learning …, 2016 - jmlr.org

In the past few years, differential privacy has become a standard concept in the area of
privacy. One of the most important problems in this field is to answer queries while …

被引用次数：262 相关文章所有 10 个版本

[PDF] mlr.press

Optimistic knowledge gradient policy for optimal budget allocation in crowdsourcing

X Chen, Q Lin, D Zhou - International conference on …, 2013 - proceedings.mlr.press

In real crowdsourcing applications, each label from a crowd usually comes with a certain
cost. Given a pre-fixed amount of budget, since different tasks have different ambiguities and …

被引用次数：153 相关文章所有 12 个版本

[PDF] arxiv.org

Deterministic sequencing of exploration and exploitation for multi-armed bandit problems

S Vakili, K Liu, Q Zhao - IEEE Journal of Selected Topics in …, 2013 - ieeexplore.ieee.org

In the Multi-Armed Bandit (MAB) problem, there is a given set of arms with unknown reward
models. At each time, a player selects one arm to play, aiming to maximize the total …

被引用次数：128 相关文章所有 10 个版本

[PDF] ieee.org

Reinforcement learning based stochastic shortest path finding in wireless sensor networks

W Xia, C Di, H Guo, S Li - Ieee Access, 2019 - ieeexplore.ieee.org

Many factors influence the connection states between nodes of wireless sensor networks,
such as physical distance, and the network load, making the network's edge length dynamic …

被引用次数：45 相关文章所有 3 个版本

[PDF] neurips.cc

Statistical efficiency of thompson sampling for combinatorial semi-bandits

P Perrault, E Boursier, M Valko… - Advances in Neural …, 2020 - proceedings.neurips.cc

We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback
(CMAB). In CMAB, the question of the existence of an efficient policy with an optimal …

被引用次数：47 相关文章所有 9 个版本

[PDF] hal.science

Stochastic online shortest path routing: The value of feedback

MS Talebi, Z Zou, R Combes… - … on Automatic Control, 2017 - ieeexplore.ieee.org

This paper studies online shortest path routing over multihop networks. Link costs or delays
are time varying and modeled by independent and identically distributed random processes …

被引用次数：79 相关文章所有 6 个版本

[PDF] nsf.gov

Matching while learning

R Johari, V Kamble, Y Kanoria - Operations Research, 2021 - pubsonline.informs.org

We consider the problem faced by a service platform that needs to match limited supply with
demand while learning the attributes of new users to match them better in the future. We …

被引用次数：55 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Online learning of energy consumption for navigation of electric vehicles

N Åkerblom, Y Chen, MH Chehreghani - Artificial Intelligence, 2023 - Elsevier

Energy efficient navigation constitutes an important challenge in electric vehicles, due to
their limited battery capacity. We employ a Bayesian approach to model the energy …

被引用次数：17 相关文章所有 8 个版本

[PDF] mlr.press

No-regret algorithms for heavy-tailed linear bandits

AM Medina, S Yang - International Conference on Machine …, 2016 - proceedings.mlr.press

We analyze the problem of linear bandits under heavy tailed noise. Most of of the work on
linear bandits has been based on the assumption of bounded or sub-Gaussian noise …

被引用次数：44 相关文章所有 8 个版本

高级搜索

QQ 群