An empirical relative value learning algorithm for non-parametric MDPs with continuous state space- 学术资源搜索

文章

学术资源搜索

An empirical relative value learning algorithm for non-parametric MDPs with continuous state space

H Sharma, R Jain, A Gupta - 2019 18th European Control …, 2019 - ieeexplore.ieee.org

2019 18th European Control Conference (ECC), 2019•ieeexplore.ieee.org

We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs
with continuous state space and finite actions and average reward criterion. The ERVL
algorithm relies on function approximation via nearest neighbors, and minibatch samples for
value function update. It is universal (will work for any MDP), computationally quite simple
and yet provides arbitrarily good approximation with high probability in finite time. This is the
first such algorithm for non-parametric (and continuous state space) MDPs with average …

We propose an empirical relative value learning (ERVL) algorithm for non-parametric MDPs with continuous state space and finite actions and average reward criterion. The ERVL algorithm relies on function approximation via nearest neighbors, and minibatch samples for value function update. It is universal (will work for any MDP), computationally quite simple and yet provides arbitrarily good approximation with high probability in finite time. This is the first such algorithm for non-parametric (and continuous state space) MDPs with average reward criteria with these provable properties as far as we know. Numerical evaluation on a benchmark problem of optimal replacement suggests good performance.

ieeexplore.ieee.org

展开收起

被引用次数：13 相关文章所有 2 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

Google学术搜索按钮

安装不用了

example.edu/paper.pdf

搜索

获取 PDF 文件

引用

References

高级搜索

QQ 群

An empirical relative value learning algorithm for non-parametric MDPs with continuous state space

引用