Joint optimization of multi-objective reinforcement learning with policy gradient based algorithm

Z Chen, Y Zhou, RR Chen… - … Conference on Machine …, 2022 - proceedings.mlr.press

Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …

被引用次数：24 相关文章所有 7 个版本

[PDF] neurips.cc

Anchor-changing regularized natural policy gradient for multi-objective reinforcement learning

R Zhou, T Liu, D Kalathil… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study policy optimization for Markov decision processes (MDPs) with multiple reward
value functions, which are to be jointly optimized according to given criteria such as …

被引用次数：13 相关文章所有 8 个版本

[PDF] jair.org Full View

Achieving zero constraint violation for concave utility constrained reinforcement learning via primal-dual approach

Q Bai, AS Bedi, M Agarwal, A Koppel… - Journal of Artificial …, 2023 - jair.org

Reinforcement learning (RL) is widely used in applications where one needs to perform
sequential decision-making while interacting with the environment. The standard RL …

被引用次数：6 相关文章所有 6 个版本

[PDF] openreview.net

On the Hardness of Constrained Cooperative Multi-Agent Reinforcement Learning

Z Chen, Y Zhou, H Huang - The Twelfth International Conference on …, 2024 - openreview.net

Constrained cooperative multi-agent reinforcement learning (MARL) is an emerging
learning framework that has been widely applied to manage multi-agent systems, and many …

Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning

K Ding - arXiv preprint arXiv:2211.08669, 2022 - arxiv.org

Multi-objective reinforcement learning (MORL) is a relatively new field which builds on
conventional Reinforcement Learning (RL) to solve multi-objective problems. One of …

被引用次数：2 相关文章所有 2 个版本

[HTML] mdpi.com

[HTML][HTML] Machine Learning for Communications

V Aggarwal - Entropy, 2021 - mdpi.com

Due to the proliferation of applications and services that run over communication networks,
ranging from video streaming and data analytics to robotics and augmented reality …

被引用次数：2 相关文章所有 7 个版本

[PDF] tamu.edu

Information-Theoretic Measures in Selected Learning Problems

R Zhou - 2023 - search.proquest.com

We study the usage of information-theoretic measures in learning problems. The first
problem considered is the algorithm-dependent generalization error bound. Conceptually …

Stochastic Second Order Methods and Finite Time Analysis of Policy Gradient Methods

R Yuan - 2023 - theses.hal.science

To solve large scale machine learning problems, first-order methods such as stochastic
gradient descent and ADAM are the methods of choice because of their low cost per …

高级搜索

QQ 群