作者
Sridhar Mahadevan
发表日期
2005/8/7
图书
Proceedings of the 22nd international conference on Machine learning
页码范围
553-560
简介
This paper presents a novel framework called proto-reinforcement learning (PRL), based on a mathematical model of a proto-value function: these are task-independent basis functions that form the building blocks of all value functions on a given state space manifold. Proto-value functions are learned not from rewards, but instead from analyzing the topology of the state space. Formally, proto-value functions are Fourier eigenfunctions of the Laplace-Beltrami diffusion operator on the state space manifold. Proto-value functions facilitate structural decomposition of large state spaces, and form geodesically smooth orthonormal basis functions for approximating any value function. The theoretical basis for proto-value functions combines insights from spectral graph theory, harmonic analysis, and Riemannian manifolds. Proto-value functions enable a novel generation of algorithms called representation policy iteration …
引用总数
2005200620072008200920102011201220132014201520162017201820192020202120222023202431010115656668771014101315177
学术搜索中的文章
S Mahadevan - Proceedings of the 22nd international conference on …, 2005