Td3 with reverse kl regularizer for offline reinforcement learning from mixed datasets

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

Td3 with reverse kl regularizer for offline reinforcement learning from mixed datasets

Dual Behavior Regularized Offline Deterministic Actor–Critic

S Cao, X Wang, Y Cheng - IEEE Transactions on Systems …, 2024 - ieeexplore.ieee.org

To mitigate the extrapolation error arising from offline reinforcement learning (RL) paradigm,
existing methods typically make learned-functions over-conservative or enforce global policy …

被引用次数：1 相关文章

[PDF] arxiv.org

Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning

M Wang, Y Jin, G Montana - arXiv preprint arXiv:2412.03258, 2024 - arxiv.org

Offline reinforcement learning (RL) seeks to learn optimal policies from static datasets
without interacting with the environment. A common challenge is handling multi-modal …

Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

L Xu, Z Jiang, J Wang, L Song, J Bian - arXiv preprint arXiv:2306.03680, 2023 - arxiv.org

Offline reinforcement learning (RL) methodologies enforce constraints on the policy to
adhere closely to the behavior policy, thereby stabilizing value learning and mitigating the …

高级搜索

QQ 群

Td3 with reverse kl regularizer for offline reinforcement learning from mixed datasets

Dual Behavior Regularized Offline Deterministic Actor–Critic

Learning on One Mode: Addressing Multi-Modality in Offline Reinforcement Learning

Mildly Constrained Evaluation Policy for Offline Reinforcement Learning

引用