B Qu, X Cao, Q Guo, C Yi, IW Tsang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
In this study, we present a transductive inference approach on that reward information propagation graph, which enables the effective estimation of rewards for unlabelled data in …
C Sun, H Qian, C Miao - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
Offline reinforcement learning (RL) aims to learn an effective policy from a pre-collected dataset. Most existing works are to develop sophisticated learning algorithms, with less …
H Cheng, S Jung, YB Kim - Applied Thermal Engineering, 2024 - Elsevier
The temperature of the battery plays a critical role in the working conditions and a suitable temperature environment can help the battery extend its capacity. Air cooling and water …
H Xu, J Xuan, G Zhang, J Lu - Neurocomputing, 2024 - Elsevier
Trust region policy optimization (TRPO) is one of the landmark policy optimization algorithms in deep reinforcement learning. Its purpose is to maximize a surrogate objective based on …
Z Wang, J Hu, G Min, Z Zhao - ACM Transactions on Sensor Networks, 2023 - dl.acm.org
Cooperative edge caching enables edge servers to jointly utilize their cache to store popular contents, thus drastically reducing the latency of content acquisition. One fundamental …
Model-based offline reinforcement learning (RL) algorithms have emerged as a promising paradigm for offline RL. These algorithms usually learn a dynamics model from a static …
Y Dai, O Ma, L Zhang, X Liang, S Hu, M Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Transformer-based trajectory optimization methods have demonstrated exceptional performance in offline Reinforcement Learning (offline RL), yet it poses challenges due to …
M Du, H Yu, N Kong - INFORMS Journal on Computing, 2024 - pubsonline.informs.org
We investigate a novel type of online sequential decision problem under uncertainty, namely mixed observability Markov decision process with time-varying interval-valued parameters …