作者
Stefano Boldrini, Luca De Nardis, Giuseppe Caso, Mai TP Le, Jocelyn Fiorina, Maria-Gabriella Di Benedetto
发表日期
2018/1/26
期刊
Algorithms
卷号
11
期号
2
页码范围
13
出版商
MDPI
简介
Multi-armed bandit (MAB) models are a viable approach to describe the problem of best wireless network selection by a multi-Radio Access Technology (multi-RAT) device, with the goal of maximizing the quality perceived by the final user. The classical MAB model does not allow, however, to properly describe the problem of wireless network selection by a multi-RAT device, in which a device typically performs a set of measurements in order to collect information on available networks, before a selection takes place. The MAB model foresees in fact only one possible action for the player, which is the selection of one among different arms at each time step; existing arm selection algorithms thus mainly differ in the rule according to which a specific arm is selected. This work proposes a new MAB model, named measure-use-MAB (muMAB), aiming at providing a higher flexibility, and thus a better accuracy in describing the network selection problem. The muMAB model extends the classical MAB model in a twofold manner; first, it foresees two different actions: to measure and to use; second, it allows actions to span over multiple time steps. Two new algorithms designed to take advantage of the higher flexibility provided by the muMAB model are also introduced. The first one, referred to as measure-use-UCB1 (muUCB1) is derived from the well known UCB1 algorithm, while the second one, referred to as Measure with Logarithmic Interval (MLI), is appositely designed for the new model so to take advantage of the new measure action, while aggressively using the best arm. The new algorithms are compared against existing ones from the literature in …
引用总数
2018201920202021202220232024151010672
学术搜索中的文章