Reinforcement learning with a terminator

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

Reinforcement learning with a terminator

在引用文章中搜索

[PDF] mlr.press

Reinforcement learning with history dependent dynamic contexts

G Tennenholtz, N Merlis, L Shani… - International …, 2023 - proceedings.mlr.press

Abstract We introduce Dynamic Contextual Markov Decision Processes (DCMDPs), a novel
reinforcement learning framework for history-dependent environments that generalizes the …

被引用次数：3 相关文章所有 18 个版本

[PDF] arxiv.org

Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

A Cassel, O Levy, Y Mansour - arXiv preprint arXiv:2409.08570, 2024 - arxiv.org

Efficiently trading off exploration and exploitation is one of the key challenges in online
Reinforcement Learning (RL). Most works achieve this by carefully estimating the model …

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

G Tennenholtz, M Mladenov, N Merlis, RL Axtell… - arXiv preprint arXiv …, 2023 - arxiv.org

While popularity bias is recognized to play a crucial role in recommmender (and other
ranking-based) systems, detailed analysis of its impact on collective user welfare has largely …

" You just can't go around killing people''Explaining Agent Behavior to a Human Terminator

U Menkes, O Amir, A Hallak - ICML 2024 Workshop on Models of Human … - openreview.net

Consider a setting where a pre-trained agent is operating in an environment and a human
operator can decide to temporarily terminate its operation and take-over for some duration of …

高级搜索

QQ 群

Reinforcement learning with a terminator

Reinforcement learning with history dependent dynamic contexts

Batch Ensemble for Variance Dependent Regret in Stochastic Bandits

Ranking with Popularity Bias: User Welfare under Self-Amplification Dynamics

" You just can't go around killing people''Explaining Agent Behavior to a Human Terminator

引用