相关文章- 学术资源搜索

Joint optimization of multi-objective reinforcement learning with policy gradient based algorithm

Q Bai, M Agarwal, V Aggarwal - arXiv preprint arXiv:2105.14125, 2021 - arxiv.org

Many engineering problems have multiple objectives, and the overall aim is to optimize a
non-linear function of these objectives. In this paper, we formulate the problem of …

被引用次数：8 相关文章所有 5 个版本

[PDF] jair.org Full View

Joint optimization of concave scalarized multi-objective reinforcement learning with policy gradient based algorithm

Q Bai, M Agarwal, V Aggarwal - Journal of Artificial Intelligence Research, 2022 - jair.org

Many engineering problems have multiple objectives, and the overall aim is to optimize a
non-linear function of these objectives. In this paper, we formulate the problem of …

被引用次数：4 相关文章所有 5 个版本

[PDF] mlr.press

Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning

Y Bai, T Doan - 6th Annual Learning for Dynamics & Control …, 2024 - proceedings.mlr.press

We consider a multi-task learning problem, where an agent is presented a number of $ N $
reinforcement learning tasks. To solve this problem, we are interested in studying the …

Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees

N Peng, B Fain - arXiv preprint arXiv:2311.02544, 2023 - arxiv.org

We describe RA-E3 (Reward-Aware Explicit Explore or Exploit), an algorithm with provable
guarantees for solving a single or multi-objective Markov Decision Process (MDP) where we …

Cooperative Multiagent Reinforcement Learning With Partial Observations

Y Zhang, MM Zavlanos - IEEE Transactions on Automatic …, 2023 - ieeexplore.ieee.org

In this article, we propose a distributed zeroth-order policy optimization method for
multiagent reinforcement learning (MARL). Existing MARL algorithms often assume that …

被引用次数：13 相关文章所有 6 个版本

[PDF] vub.ac.be

[PDF][PDF] Adaptive objective selection for correlated objectives in multi-objective reinforcement learning

T Brys, K Van Moffaert, A Nowé… - Proceedings of the 2014 …, 2014 - ai.vub.ac.be

In this paper we introduce a novel scale-invariant and parameterless technique, called
adaptive objective selection, that allows a temporal-difference learning agent to exploit the …

被引用次数：5 相关文章所有 11 个版本

[PDF] arxiv.org

Policy gradient using weak derivatives for reinforcement learning

S Bhatt, A Koppel… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org

This paper considers policy search in continuous state-action reinforcement learning
problems. Typically, one computes search directions using a classic expression for the …

被引用次数：11 相关文章所有 7 个版本

[PDF] arxiv.org

Sample efficient policy gradient methods with recursive variance reduction

P Xu, F Gao, Q Gu - arXiv preprint arXiv:1909.08610, 2019 - arxiv.org

Improving the sample efficiency in reinforcement learning has been a long-standing
research problem. In this work, we aim to reduce the sample complexity of existing policy …

被引用次数：99 相关文章所有 7 个版本

[PDF] academia.edu

[PDF][PDF] Exploiting multiple secondary reinforcers in policy gradient reinforcement learning

G Grudic, L Ungar - International Joint Conference on Artificial …, 2001 - academia.edu

Abstract Most formulations of Reinforcement Learning depend on a single reinforcement
reward value to guide the search for the optimal policy solution. If observation of this reward …

被引用次数：4 相关文章所有 4 个版本

Multi-agent reinforcement learning for training and non-linear optimization

A Morcos, A West, B Maguire - Artificial Intelligence and …, 2022 - spiedigitallibrary.org

The field of Reinforcement Learning continues to show promise in solving old problems in
new and innovative ways. Thanks to the algorithms' ability to learn without an explicit set of …

被引用次数：2 相关文章所有 3 个版本

高级搜索

QQ 群