Transformers Learn Temporal Difference Methods for In-Context Reinforcement Learning

W Lu, X Zhao, J Spisak, JH Lee, S Wermter - arXiv preprint arXiv …, 2024 - arxiv.org

Can emergent language models faithfully model the intelligence of decision-making agents?
Though modern language models exhibit already some reasoning ability, and theoretically …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models

C Shi, K Yang, J Yang, C Shen - arXiv preprint arXiv:2410.09701, 2024 - arxiv.org

The in-context learning (ICL) capability of pre-trained models based on the transformer
architecture has received growing interest in recent years. While theoretical understanding …

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

J Wang, S Zhang - arXiv preprint arXiv:2409.12135, 2024 - arxiv.org

Temporal difference (TD) learning with linear function approximation, abbreviated as linear
TD, is a classic and powerful prediction algorithm in reinforcement learning. While it is well …

Random Policy Enables In-Context Reinforcement Learning within Trust Horizons

W Chen, S Paternain - arXiv preprint arXiv:2410.19982, 2024 - arxiv.org

Pretrained foundation models have exhibited extraordinary in-context learning performance,
allowing zero-shot generalization to new tasks not encountered during pretraining. In the …

[PDF] arxiv.org

Provable optimal transport with transformers: The essence of depth and prompt engineering

H Daneshmand - arXiv preprint arXiv:2410.19931, 2024 - arxiv.org

Can we establish provable performance guarantees for transformers? Establishing such
theoretical guarantees is a milestone in developing trustworthy generative AI. In this paper …

[PDF][PDF] Bellman Transformer to Internalize Reinforcement Learning: TD (0) as System

D Ghosh - researchgate.net

Modern reinforcement learning (RL) systems often struggle to balance rapid decision-
making with adaptive, reflective learning—a duality that mirrors the interplay of System 1 …

高级搜索

QQ 群