optimizing trajectories offline reinforcement learning- 学术资源搜索

Offline Reinforcement Learning of Robotic Control Using Deep Kinematics and Dynamics

X Li, W Shang, S Cong - IEEE/ASME Transactions on …, 2023 - ieeexplore.ieee.org

… We design the joint space trajectory to have the same form as the trajectories in the … with
designed trajectories, and the policy of our MBORL algorithm is safely optimized offline. This …

被引用次数：2 相关文章

[PDF] openreview.net

Fine-tuning offline reinforcement learning with model-based policy optimization

A Villaflor, J Dolan, J Schneider - 2020 - openreview.net

… When collecting data in the conservative MDP, we collect h-length truncated trajectories
starting from states in the original offline dataset. By collecting data this way, we are able to …

被引用次数：9 相关文章所有 3 个版本

[PDF] mlr.press

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … on machine learning, 2022 - proceedings.mlr.press

… Offline or batch reinforcement learning seeks to learn a near-optimal policy using history …
To counter the insufficient coverage and sample scarcity of many offline datasets, the principle …

被引用次数：94 相关文章所有 11 个版本

[PDF] arxiv.org

Behavior regularized offline reinforcement learning

Y Wu, G Tucker, O Nachum - arXiv preprint arXiv:1911.11361, 2019 - arxiv.org

… -optimal policies (eg, robotic control and recommendation systems). To simulate this scenario,
we collect the offline dataset with a sub-optimal … evaluate offline RL algorithms by training …

被引用次数：688 相关文章所有 5 个版本

[PDF] mlr.press

Constrained decision transformer for offline safe reinforcement learning

Z Liu, Z Guo, Y Yao, Z Cen, W Yu… - … Machine Learning, 2023 - proceedings.mlr.press

… We study the offline safe RL problem from a novel multi-objective optimization perspective
… maximum-reward Pareto optimal trajectory with cost less than κ. Then we append the new …

被引用次数：31 相关文章所有 6 个版本

[PDF] arxiv.org

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

T Zhang, J Guan, L Zhao, Y Li, D Li, Z Zeng… - arXiv preprint arXiv …, 2024 - arxiv.org

… trajectory-based preference optimization. We directly generate preferred trajectory data
for preference optimization, … We can compare two trajectories based on success or time to …

[PDF][PDF] Understanding the effects of dataset characteristics on offline reinforcement learning

K Schweighofer, M Hofmarcher, MC Dinu… - arXiv preprint arXiv …, 2021 - academia.edu

… , the risk to crash the production machines if optimizing production processes, or the risk to
loose … In Offline RL, we assume that a dataset P of trajectories is provided. A single trajectory …

被引用次数：20 相关文章所有 5 个版本

[PDF] mlr.press

Offline reinforcement learning from images with latent space models

R Rafailov, T Yu, A Rajeswaran… - Learning for dynamics …, 2021 - proceedings.mlr.press

… , latent offline modelbased policy optimization (LOMPO), which enables … on offline RL in
high-dimensional POMDPs, where the agent has access to the fixed dataset Denv of trajectories, …

被引用次数：118 相关文章所有 7 个版本

[PDF] arxiv.org

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

S Hu, L Shen, Y Zhang, Y Chen… - … Analysis and Machine …, 2024 - ieeexplore.ieee.org

… offline RL setting can make full use of the ability of deep networks to extract the optimal policy
from a large amount of offline … discrepancy between the offline training data and the target …

被引用次数：15 相关文章所有 2 个版本

[PDF] mlr.press

Plas: Latent action space for offline reinforcement learning

W Zhou, S Bajracharya, D Held - … on Robot Learning, 2021 - proceedings.mlr.press

… [19] samples action sequences from the CVAE when they perform trajectory optimization
with the learned latent dynamics model. Krupnik et al. [20] extended the previous method to …

被引用次数：153 相关文章所有 5 个版本

高级搜索

QQ 群

Offline Reinforcement Learning of Robotic Control Using Deep Kinematics and Dynamics

Fine-tuning offline reinforcement learning with model-based policy optimization

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

Behavior regularized offline reinforcement learning

Constrained decision transformer for offline safe reinforcement learning

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

[PDF][PDF] Understanding the effects of dataset characteristics on offline reinforcement learning

Offline reinforcement learning from images with latent space models

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

Plas: Latent action space for offline reinforcement learning

引用