Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis

Z Chen, Y Zhou, RR Chen… - … Conference on Machine …, 2022 - proceedings.mlr.press
Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …

Anchor-changing regularized natural policy gradient for multi-objective reinforcement learning

R Zhou, T Liu, D Kalathil… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study policy optimization for Markov decision processes (MDPs) with multiple reward
value functions, which are to be jointly optimized according to given criteria such as …

Achieving zero constraint violation for concave utility constrained reinforcement learning via primal-dual approach

Q Bai, AS Bedi, M Agarwal, A Koppel… - Journal of Artificial …, 2023 - jair.org
Reinforcement learning (RL) is widely used in applications where one needs to perform
sequential decision-making while interacting with the environment. The standard RL …

On the Hardness of Constrained Cooperative Multi-Agent Reinforcement Learning

Z Chen, Y Zhou, H Huang - The Twelfth International Conference on …, 2024 - openreview.net
Constrained cooperative multi-agent reinforcement learning (MARL) is an emerging
learning framework that has been widely applied to manage multi-agent systems, and many …

Addressing the issue of stochastic environments and local decision-making in multi-objective reinforcement learning

K Ding - arXiv preprint arXiv:2211.08669, 2022 - arxiv.org
Multi-objective reinforcement learning (MORL) is a relatively new field which builds on
conventional Reinforcement Learning (RL) to solve multi-objective problems. One of …

[HTML][HTML] Machine Learning for Communications

V Aggarwal - Entropy, 2021 - mdpi.com
Due to the proliferation of applications and services that run over communication networks,
ranging from video streaming and data analytics to robotics and augmented reality …

Information-Theoretic Measures in Selected Learning Problems

R Zhou - 2023 - search.proquest.com
We study the usage of information-theoretic measures in learning problems. The first
problem considered is the algorithm-dependent generalization error bound. Conceptually …

Stochastic Second Order Methods and Finite Time Analysis of Policy Gradient Methods

R Yuan - 2023 - theses.hal.science
To solve large scale machine learning problems, first-order methods such as stochastic
gradient descent and ADAM are the methods of choice because of their low cost per …