The problem of identifying the best answer among a collection of items having real-valued distribution is well-understood. Despite its practical relevance for many applications, fewer …
Interactive preference learning systems present humans with queries as pairs of options; humans then select their preferred choice, allowing the system to infer preferences from …
This paper introduces the\emph {confounded pure exploration transductive linear bandit}(\texttt {CPET-LB}) problem. As a motivating example, often online services cannot …
I Weaver, V Kumar - Available at SSRN 4263133, 2024 - papers.ssrn.com
We propose a novel theory-based approach to the reinforcement learning problem of maximizing profits when faced with an unknown demand curve. Our method is based on …