Provable policy gradient methods for average-reward markov potential games

M Cheng, R Zhou, PR Kumar… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We study Markov potential games under the infinite horizon average reward criterion. Most
previous studies have been for discounted rewards. We prove that both algorithms based on …

[PDF][PDF] 元强化学习研究综述

陈奕宇, 霍静, 丁天雨, 高阳 - 软件学报, 2023 - jos.org.cn
近年来, 深度强化学习(deep reinforcement learning, DRL) 已经在诸多序贯决策任务中取得
瞩目成功, 但当前深度强化学习的成功很大程度依赖于海量的学习数据与计算资源 …

[PDF][PDF] On Adaptivity and Safety in Sequential Decision Making.

S Chaudhary - IJCAI, 2023 - ijcai.org
Sequential decision making is an important field in machine learning, encompassing
techniques such as online optimization, structured bandits, and reinforcement learning …

Supervised Meta-Reinforcement Learning With Trajectory Optimization for Manipulation Tasks

L Wang, Y Zhang, D Zhu, S Coleman… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Learning from small amounts of samples with reinforcement learning (RL) is challenging in
many tasks, especially, in real-world applications, such as robotics. Meta-RL (meta-RL) has …

A Survey of Meta-Reinforcement Learning Research

陈奕宇, 霍静, 丁天雨, 高阳 - Journal of Software, 2023 - jos.org.cn
近年来, 深度强化学习 (deep reinforcement learning, DRL) 已经在诸多序贯决策任务中取得
瞩目成功, 但当前深度强化学习的成功很大程度依赖于海量的学习数据与计算资源 …