Learning to control renewal processes with bandit feedback

Z Zhu, J Zhu, J Liu, Y Liu - Proceedings of the ACM on Measurement …, 2021 - dl.acm.org

In this paper, we study Federated Bandit, a decentralized Multi-Armed Bandit problem with a
set of N agents, who can only communicate their local data with neighbors described by a …

被引用次数：95 相关文章所有 8 个版本

[PDF] mlr.press

Budget-constrained bandits over general cost and reward distributions

S Cayci, A Eryilmaz, R Srikant - International Conference on …, 2020 - proceedings.mlr.press

We consider a budget-constrained bandit problem where each arm pull incurs a random
cost, and yields a random reward in return. The objective is to maximize the total expected …

被引用次数：35 相关文章所有 8 个版本

Online task scheduling and termination with throughput constraint

Q Liu, Z Fang - IEEE/ACM Transactions on Networking, 2024 - ieeexplore.ieee.org

We consider the task scheduling scenario where the controller activates one from task types
at each time. Each task induces a random completion time, and a reward is obtained only …

被引用次数：1 相关文章所有 4 个版本

[PDF] neurips.cc

Group-fair online allocation in continuous time

S Cayci, S Gupta, A Eryilmaz - Advances in Neural …, 2020 - proceedings.neurips.cc

The theory of discrete-time online learning has been successfully applied in many problems
that involve sequential decision-making under uncertainty. However, in many applications …

被引用次数：21 相关文章所有 7 个版本

[PDF] neurips.cc

Bandit task assignment with unknown processing time

S Ito, D Hatano, H Sumita… - Advances in …, 2024 - proceedings.neurips.cc

This study considers a novel problem setting, referred to as\textit {bandit task assignment},
that incorporates the processing time of each task in the bandit setting. In this problem …

Effort Level Search in Infinite Completion Trees with Application to Task-and-Motion Planning

M Toussaint, J Ortiz-Haro, VN Hartmann… - … on Robotics and …, 2024 - ieeexplore.ieee.org

Solving a Task-and-Motion Planning (TAMP) problem can be represented as a sequential
(meta-) decision process, where early decisions concern the skeleton (sequence of logic …

被引用次数：1 相关文章

[PDF] jmlr.org

Fast learning for renewal optimization in online task scheduling

MJ Neely - Journal of Machine Learning Research, 2021 - jmlr.org

This paper considers online optimization of a renewal-reward system. A controller performs
a sequence of tasks back-to-back. Each task has a random vector of parameters, called the …

被引用次数：14 相关文章所有 7 个版本

[PDF] arxiv.org

Multi-armed bandit problem with temporally-partitioned rewards: When partial feedback counts

G Romano, A Agostini, F Trovò, N Gatti… - arXiv preprint arXiv …, 2022 - arxiv.org

There is a rising interest in industrial online applications where data becomes available
sequentially. Inspired by the recommendation of playlists to users where their preferences …

被引用次数：5 相关文章所有 8 个版本

[PDF] arxiv.org

Learning to Schedule Online Tasks with Bandit Feedback

Y Xu, S Wang, H Guo, X Liu, Z Shao - arXiv preprint arXiv:2402.16463, 2024 - arxiv.org

Online task scheduling serves an integral role for task-intensive applications in cloud
computing and crowdsourcing. Optimal scheduling can enhance system performance …

Bandits with Stochastic Experts: Constant Regret, Empirical Experts and Episodes

N Sharma, R Sen, S Basu, K Shanmugam… - ACM Transactions on …, 2024 - dl.acm.org

We study a variant of the contextual bandit problem where an agent can intervene through a
set of stochastic expert policies. Given a fixed context, each expert samples actions from a …

高级搜索

QQ 群