Combinatorial network optimization with unknown variables: Multi-armed bandits with linear...- 学术资源搜索

文章

学术资源搜索

Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations

Y Gai, B Krishnamachari, R Jain - IEEE/ACM Transactions on …, 2012 - ieeexplore.ieee.org

We formulate the following combinatorial multi-armed bandit (MAB) problem: There are N
random variables with unknown mean that are each instantiated in an iid fashion over time.
At each time multiple random variables can be selected, subject to an arbitrary constraint on
weights associated with the selected variables. All of the selected individual random
variables are observed at that time, and a linearly weighted combination of these selected
variables is yielded as the reward. The goal is to find a policy that minimizes regret, defined …

被引用次数：509 相关文章所有 12 个版本

[PDF] arxiv.org

Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards

Y Gai, B Krishnamachari, R Jain - arXiv preprint arXiv:1011.4748, 2010 - arxiv.org

In the classic multi-armed bandits problem, the goal is to have a policy for dynamically
operating arms that each yield stochastic rewards with unknown means. The key metric of
interest is regret, defined as the gap between the expected total reward accumulated by an
omniscient player that knows the reward means for each arm, and the expected total reward
accumulated by the given policy. The policies presented in prior work have storage,
computation and regret all growing linearly with the number of arms, which is not scalable …

被引用次数：12 相关文章所有 6 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards and individual observations

Combinatorial network optimization with unknown variables: Multi-armed bandits with linear rewards

引用