H-GAP: Humanoid Control with a Generalist Planner

Z Jiang, Y Xu, N Wagener, Y Luo, M Janner… - arXiv preprint arXiv …, 2023 - arxiv.org
Humanoid control is an important research challenge offering avenues for integration into
human-centric infrastructures and enabling physics-driven humanoid animations. The …

Transductive Reward Inference on Graph

B Qu, X Cao, Q Guo, C Yi, IW Tsang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In this study, we present a transductive inference approach on that reward information
propagation graph, which enables the effective estimation of rewards for unlabelled data in …

CUDC: A Curiosity-Driven Unsupervised Data Collection Method with Adaptive Temporal Distances for Offline Reinforcement Learning

C Sun, H Qian, C Miao - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Offline reinforcement learning (RL) aims to learn an effective policy from a pre-collected
dataset. Most existing works are to develop sophisticated learning algorithms, with less …

Battery thermal management system optimization using Deep reinforced learning algorithm

H Cheng, S Jung, YB Kim - Applied Thermal Engineering, 2024 - Elsevier
The temperature of the battery plays a critical role in the working conditions and a suitable
temperature environment can help the battery extend its capacity. Air cooling and water …

[HTML][HTML] Trust region policy optimization via entropy regularization for Kullback–Leibler divergence constraint

H Xu, J Xuan, G Zhang, J Lu - Neurocomputing, 2024 - Elsevier
Trust region policy optimization (TRPO) is one of the landmark policy optimization algorithms
in deep reinforcement learning. Its purpose is to maximize a surrogate objective based on …

Intelligent Cooperative Caching at Mobile Edge based on Offline Deep Reinforcement Learning

Z Wang, J Hu, G Min, Z Zhao - ACM Transactions on Sensor Networks, 2023 - dl.acm.org
Cooperative edge caching enables edge servers to jointly utilize their cache to store popular
contents, thus drastically reducing the latency of content acquisition. One fundamental …

OCEAN-MBRL: Offline Conservative Exploration for Model-Based Offline Reinforcement Learning

F Wu, R Zhang, Q Yi, Y Gao, J Guo, S Peng… - Proceedings of the …, 2024 - ojs.aaai.org
Model-based offline reinforcement learning (RL) algorithms have emerged as a promising
paradigm for offline RL. These algorithms usually learn a dynamics model from a static …

Is Mamba Compatible with Trajectory Optimization in Offline Reinforcement Learning?

Y Dai, O Ma, L Zhang, X Liang, S Hu, M Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Transformer-based trajectory optimization methods have demonstrated exceptional
performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to …

Transfer Reinforcement Learning for Mixed Observability Markov Decision Processes with Time-Varying Interval-Valued Parameters and Its Application in Pandemic …

M Du, H Yu, N Kong - INFORMS Journal on Computing, 2024 - pubsonline.informs.org
We investigate a novel type of online sequential decision problem under uncertainty, namely
mixed observability Markov decision process with time-varying interval-valued parameters …

基于表征学习的离线强化学习方法研究综述

王雪松, 王荣荣, 程玉虎 - 自动化学报, 2024 - aas.net.cn
强化学习通过智能体与环境在线交互来学习最优策略, 近年来已成为解决复杂环境下感知决策
问题的重要手段. 然而, 在线收集数据的方式可能会引发安全, 时间或成本等问题 …