Joint optimization of multi-objective reinforcement learning with policy gradient based algorithm

Q Bai, M Agarwal, V Aggarwal - arXiv preprint arXiv:2105.14125, 2021 - arxiv.org
Many engineering problems have multiple objectives, and the overall aim is to optimize a
non-linear function of these objectives. In this paper, we formulate the problem of …

Joint optimization of concave scalarized multi-objective reinforcement learning with policy gradient based algorithm

Q Bai, M Agarwal, V Aggarwal - Journal of Artificial Intelligence Research, 2022 - jair.org
Many engineering problems have multiple objectives, and the overall aim is to optimize a
non-linear function of these objectives. In this paper, we formulate the problem of …

Finite-time complexity of incremental policy gradient methods for solving multi-task reinforcement learning

Y Bai, T Doan - 6th Annual Learning for Dynamics & Control …, 2024 - proceedings.mlr.press
We consider a multi-task learning problem, where an agent is presented a number of $ N $
reinforcement learning tasks. To solve this problem, we are interested in studying the …

Nonlinear Multi-objective Reinforcement Learning with Provable Guarantees

N Peng, B Fain - arXiv preprint arXiv:2311.02544, 2023 - arxiv.org
We describe RA-E3 (Reward-Aware Explicit Explore or Exploit), an algorithm with provable
guarantees for solving a single or multi-objective Markov Decision Process (MDP) where we …

Cooperative Multiagent Reinforcement Learning With Partial Observations

Y Zhang, MM Zavlanos - IEEE Transactions on Automatic …, 2023 - ieeexplore.ieee.org
In this article, we propose a distributed zeroth-order policy optimization method for
multiagent reinforcement learning (MARL). Existing MARL algorithms often assume that …

[PDF][PDF] Adaptive objective selection for correlated objectives in multi-objective reinforcement learning

T Brys, K Van Moffaert, A Nowé… - Proceedings of the 2014 …, 2014 - ai.vub.ac.be
In this paper we introduce a novel scale-invariant and parameterless technique, called
adaptive objective selection, that allows a temporal-difference learning agent to exploit the …

Policy gradient using weak derivatives for reinforcement learning

S Bhatt, A Koppel… - 2019 IEEE 58th …, 2019 - ieeexplore.ieee.org
This paper considers policy search in continuous state-action reinforcement learning
problems. Typically, one computes search directions using a classic expression for the …

Sample efficient policy gradient methods with recursive variance reduction

P Xu, F Gao, Q Gu - arXiv preprint arXiv:1909.08610, 2019 - arxiv.org
Improving the sample efficiency in reinforcement learning has been a long-standing
research problem. In this work, we aim to reduce the sample complexity of existing policy …

[PDF][PDF] Exploiting multiple secondary reinforcers in policy gradient reinforcement learning

G Grudic, L Ungar - International Joint Conference on Artificial …, 2001 - academia.edu
Abstract Most formulations of Reinforcement Learning depend on a single reinforcement
reward value to guide the search for the optimal policy solution. If observation of this reward …

Multi-agent reinforcement learning for training and non-linear optimization

A Morcos, A West, B Maguire - Artificial Intelligence and …, 2022 - spiedigitallibrary.org
The field of Reinforcement Learning continues to show promise in solving old problems in
new and innovative ways. Thanks to the algorithms' ability to learn without an explicit set of …