We consider a budget-constrained bandit problem where each arm pull incurs a random cost, and yields a random reward in return. The objective is to maximize the total expected …
Q Liu, Z Fang - IEEE/ACM Transactions on Networking, 2024 - ieeexplore.ieee.org
We consider the task scheduling scenario where the controller activates one from task types at each time. Each task induces a random completion time, and a reward is obtained only …
The theory of discrete-time online learning has been successfully applied in many problems that involve sequential decision-making under uncertainty. However, in many applications …
This study considers a novel problem setting, referred to as\textit {bandit task assignment}, that incorporates the processing time of each task in the bandit setting. In this problem …
Solving a Task-and-Motion Planning (TAMP) problem can be represented as a sequential (meta-) decision process, where early decisions concern the skeleton (sequence of logic …
This paper considers online optimization of a renewal-reward system. A controller performs a sequence of tasks back-to-back. Each task has a random vector of parameters, called the …
There is a rising interest in industrial online applications where data becomes available sequentially. Inspired by the recommendation of playlists to users where their preferences …
Online task scheduling serves an integral role for task-intensive applications in cloud computing and crowdsourcing. Optimal scheduling can enhance system performance …
We study a variant of the contextual bandit problem where an agent can intervene through a set of stochastic expert policies. Given a fixed context, each expert samples actions from a …