[图书][B] Reinforcement learning for sequential decision and optimal control

SE Li - 2023 - Springer
Since the beginning of the 21st century, artificial intelligence (AI) has been reshaping almost
all areas of human society, which has high potential to spark the fourth industrial revolution …

Latent state marginalization as a low-cost approach for improving exploration

D Zhang, A Courville, Y Bengio, Q Zheng… - arXiv preprint arXiv …, 2022 - arxiv.org
While the maximum entropy (MaxEnt) reinforcement learning (RL) framework--often touted
for its exploration and robustness capabilities--is usually motivated from a probabilistic …

Offline RL with discrete proxy representations for generalizability in POMDPs

P Gu, X Cai, D Xing, X Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Abstract Offline Reinforcement Learning (RL) has demonstrated promising results in various
applications by learning policies from previously collected datasets, reducing the need for …

Learning belief representations for partially observable deep RL

A Wang, AC Li, TQ Klassen, RT Icarte… - International …, 2023 - proceedings.mlr.press
Many important real-world Reinforcement Learning (RL) problems involve partial
observability and require policies with memory. Unfortunately, standard deep RL algorithms …

The Wasserstein Believer: Learning Belief Updates for Partially Observable Environments through Reliable Latent Space Models

R Avalos, F Delgrange, A Nowé, GA Pérez… - arXiv preprint arXiv …, 2023 - arxiv.org
Partially Observable Markov Decision Processes (POMDPs) are used to model
environments where the full state cannot be perceived by an agent. As such the agent needs …

DAG-Plan: Generating Directed Acyclic Dependency Graphs for Dual-Arm Cooperative Planning

Z Gao, Y Mu, J Qu, M Hu, L Guo, P Luo, Y Lu - arXiv preprint arXiv …, 2024 - arxiv.org
Dual-arm robots offer enhanced versatility and efficiency over single-arm counterparts by
enabling concurrent manipulation of multiple objects or cooperative execution of tasks using …

Belief-Enriched Pessimistic Q-Learning against Adversarial State Perturbations

X Sun, Z Zheng - arXiv preprint arXiv:2403.04050, 2024 - arxiv.org
Reinforcement learning (RL) has achieved phenomenal success in various domains.
However, its data-driven nature also introduces new vulnerabilities that can be exploited by …

Future Prediction Can be a Strong Evidence of Good History Representation in Partially Observable Environments

J Kwon, L Yang, R Nowak, J Hanna - arXiv preprint arXiv:2402.07102, 2024 - arxiv.org
Learning a good history representation is one of the core challenges of reinforcement
learning (RL) in partially observable environments. Recent works have shown the …

Set-membership belief state-based reinforcement learning for POMDPs

W Wei, L Zhang, L Li, H Song… - … Conference on Machine …, 2023 - proceedings.mlr.press
Reinforcement learning (RL) has made significant progress in areas such as Atari games
and robotic control, where the agents have perfect sensing capabilities. However, in many …

EPPTA: Efficient partially observable reinforcement learning agent for penetration testing applications

Z Li, Q Zhang, G Yang - Engineering Reports, 2023 - Wiley Online Library
In recent years, penetration testing (pen‐testing) has emerged as a crucial process for
evaluating the security level of network infrastructures by simulating real‐world cyber …