Mental modeling of reinforcement learning agents by language models

W Lu, X Zhao, J Spisak, JH Lee, S Wermter - arXiv preprint arXiv …, 2024 - arxiv.org
Can emergent language models faithfully model the intelligence of decision-making agents?
Though modern language models exhibit already some reasoning ability, and theoretically …

Transformers as Game Players: Provable In-context Game-playing Capabilities of Pre-trained Models

C Shi, K Yang, J Yang, C Shen - arXiv preprint arXiv:2410.09701, 2024 - arxiv.org
The in-context learning (ICL) capability of pre-trained models based on the transformer
architecture has received growing interest in recent years. While theoretical understanding …

Almost Sure Convergence of Linear Temporal Difference Learning with Arbitrary Features

J Wang, S Zhang - arXiv preprint arXiv:2409.12135, 2024 - arxiv.org
Temporal difference (TD) learning with linear function approximation, abbreviated as linear
TD, is a classic and powerful prediction algorithm in reinforcement learning. While it is well …

Random Policy Enables In-Context Reinforcement Learning within Trust Horizons

W Chen, S Paternain - arXiv preprint arXiv:2410.19982, 2024 - arxiv.org
Pretrained foundation models have exhibited extraordinary in-context learning performance,
allowing zero-shot generalization to new tasks not encountered during pretraining. In the …

Provable optimal transport with transformers: The essence of depth and prompt engineering

H Daneshmand - arXiv preprint arXiv:2410.19931, 2024 - arxiv.org
Can we establish provable performance guarantees for transformers? Establishing such
theoretical guarantees is a milestone in developing trustworthy generative AI. In this paper …

[PDF][PDF] Bellman Transformer to Internalize Reinforcement Learning: TD (0) as System

D Ghosh - researchgate.net
Modern reinforcement learning (RL) systems often struggle to balance rapid decision-
making with adaptive, reflective learning—a duality that mirrors the interplay of System 1 …