Reinforcement learning with history dependent dynamic contexts

G Tennenholtz, N Merlis, L Shani… - International …, 2023 - proceedings.mlr.press
Abstract We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel
reinforcement learning framework for history-dependent environments that generalizes the …

Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

A Cassel, O Levy, Y Mansour - arXiv preprint arXiv:2409.08570, 2024 - arxiv.org
Efficiently trading off exploration and exploitation is one of the key challenges in online
Reinforcement Learning (RL). Most works achieve this by carefully estimating the model …

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

G Tennenholtz, M Mladenov, N Merlis, RL Axtell… - arXiv preprint arXiv …, 2023 - arxiv.org
While popularity bias is recognized to play a crucial role in recommmender (and other
ranking-based) systems, detailed analysis of its impact on collective user welfare has largely …

" You just can't go around killing people''Explaining Agent Behavior to a Human Terminator

U Menkes, O Amir, A Hallak - ICML 2024 Workshop on Models of Human … - openreview.net
Consider a setting where a pre-trained agent is operating in an environment and a human
operator can decide to temporarily terminate its operation and take-over for some duration of …