Optimal Exploration is no harder than Thompson Sampling

文章

学术资源搜索

获得 5 条结果（用时0.02秒）

Optimal Exploration is no harder than Thompson Sampling

Online Non-Stationary Pricing Incentives for Budget-Limited Crowdsensing

J Sun, D Wu - IEEE Transactions on Big Data, 2024 - ieeexplore.ieee.org

The promising applications of mobile crowdsensing (MCS) have attracted much research
interest recently, especially for the posted-pricing scenes. However, existing works mainly …

[PDF][PDF] Pareto Set Identification With Posterior Sampling

C Kone, M Jourdan, E Kaufmann - arXiv preprint arXiv:2411.04939, 2024 - arxiv.org

The problem of identifying the best answer among a collection of items having real-valued
distribution is well-understood. Despite its practical relevance for many applications, fewer …

Enhancing Preference-based Linear Bandits via Human Response Time

S Li, Y Zhang, Z Ren, C Liang, N Li, JA Shah - arXiv preprint arXiv …, 2024 - arxiv.org

Interactive preference learning systems present humans with queries as pairs of options;
humans then select their preferred choice, allowing the system to infer preferences from …

Adaptive Experimentation When You Can't Experiment

Y Zhao, KS Jun, T Fiez, L Jain - arXiv preprint arXiv:2406.10738, 2024 - arxiv.org

This paper introduces the\emph {confounded pure exploration transductive linear
bandit}(\texttt {CPET-LB}) problem. As a motivating example, often online services cannot …

被引用次数：1 相关文章所有 3 个版本

[PDF] ssrn.com

Nonparametric bandits leveraging informational externalities to learn the demand curve

I Weaver, V Kumar - Available at SSRN 4263133, 2024 - papers.ssrn.com

We propose a novel theory-based approach to the reinforcement learning problem of
maximizing profits when faced with an unknown demand curve. Our method is based on …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

Optimal Exploration is no harder than Thompson Sampling

Online Non-Stationary Pricing Incentives for Budget-Limited Crowdsensing

[PDF][PDF] Pareto Set Identification With Posterior Sampling

Enhancing Preference-based Linear Bandits via Human Response Time

Adaptive Experimentation When You Can't Experiment

Nonparametric bandits leveraging informational externalities to learn the demand curve

引用