Core: Capitalizing on rewards in bandit exploration

文章

学术资源搜索

获得 2 条结果（用时0.02秒）

我的图书馆

Core: Capitalizing on rewards in bandit exploration

在引用文章中搜索

[PDF] arxiv.org

IMO: Interactive Multi-Objective Off-Policy Optimization

N Wang, H Wang, M Karimzadehgan, B Kveton… - arXiv preprint arXiv …, 2022 - arxiv.org

Most real-world optimization problems have multiple objectives. A system designer needs to
find a policy that trades off these objectives to reach a desired operating point. This problem …

被引用次数：4 相关文章所有 8 个版本

[PDF] arxiv.org

Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

MJ Azizi, SM Ross, Z Zhang - arXiv preprint arXiv:2106.06848, 2021 - arxiv.org

We consider the problem of finding, through adaptive sampling, which of $ n $ options
(arms) has the largest mean. Our objective is to determine a rule which identifies the best …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群

Core: Capitalizing on rewards in bandit exploration

IMO: Interactive Multi-Objective Off-Policy Optimization

Guaranteed Fixed-Confidence Best Arm Identification in Multi-Armed Bandits: Simple Sequential Elimination Algorithms

引用