Loosely consistent emphatic temporal-difference learning

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

Loosely consistent emphatic temporal-difference learning

在引用文章中搜索

[PDF] arxiv.org

Streaming Deep Reinforcement Learning Finally Works

M Elsayed, G Vasan, AR Mahmood - arXiv preprint arXiv:2410.14606, 2024 - arxiv.org

Natural intelligence processes experience as a continuous stream, sensing, acting, and
learning moment-by-moment in real time. Streaming learning, the modus operandi of classic …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

F Che, C Xiao, J Mei, B Dai, R Gummadi… - arXiv preprint arXiv …, 2024 - arxiv.org

We prove that the combination of a target network and over-parameterized linear function
approximation establishes a weaker convergence condition for bootstrapped value …

被引用次数：3 相关文章所有 3 个版本

[PDF] openreview.net

Deep Reinforcement Learning Without Experience Replay, Target Networks, or Batch Updates

M Elsayed, G Vasan, AR Mahmood - … 2024 Workshop on Fine-Tuning in … - openreview.net

Natural intelligence processes experience as a continuous stream, sensing, acting, and
learning moment-by-moment in real time. Streaming learning, the modus operandi of classic …

高级搜索

QQ 群

Loosely consistent emphatic temporal-difference learning

Streaming Deep Reinforcement Learning Finally Works

Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation

Deep Reinforcement Learning Without Experience Replay, Target Networks, or Batch Updates

引用