Offline Reinforcement Learning of Robotic Control Using Deep Kinematics and Dynamics

X Li, W Shang, S Cong - IEEE/ASME Transactions on …, 2023 - ieeexplore.ieee.org
… We design the joint space trajectory to have the same form as the trajectories in the … with
designed trajectories, and the policy of our MBORL algorithm is safely optimized offline. This …

Fine-tuning offline reinforcement learning with model-based policy optimization

A Villaflor, J Dolan, J Schneider - 2020 - openreview.net
… When collecting data in the conservative MDP, we collect h-length truncated trajectories
starting from states in the original offline dataset. By collecting data this way, we are able to …

Pessimistic q-learning for offline reinforcement learning: Towards optimal sample complexity

L Shi, G Li, Y Wei, Y Chen… - … on machine learning, 2022 - proceedings.mlr.press
Offline or batch reinforcement learning seeks to learn a near-optimal policy using history …
To counter the insufficient coverage and sample scarcity of many offline datasets, the principle …

Behavior regularized offline reinforcement learning

Y Wu, G Tucker, O Nachum - arXiv preprint arXiv:1911.11361, 2019 - arxiv.org
… -optimal policies (eg, robotic control and recommendation systems). To simulate this scenario,
we collect the offline dataset with a sub-optimal … evaluate offline RL algorithms by training

Constrained decision transformer for offline safe reinforcement learning

Z Liu, Z Guo, Y Yao, Z Cen, W Yu… - … Machine Learning, 2023 - proceedings.mlr.press
… We study the offline safe RL problem from a novel multi-objective optimization perspective
… maximum-reward Pareto optimal trajectory with cost less than κ. Then we append the new …

Preferred-Action-Optimized Diffusion Policies for Offline Reinforcement Learning

T Zhang, J Guan, L Zhao, Y Li, D Li, Z Zeng… - arXiv preprint arXiv …, 2024 - arxiv.org
trajectory-based preference optimization. We directly generate preferred trajectory data
for preference optimization, … We can compare two trajectories based on success or time to …

[PDF][PDF] Understanding the effects of dataset characteristics on offline reinforcement learning

K Schweighofer, M Hofmarcher, MC Dinu… - arXiv preprint arXiv …, 2021 - academia.edu
… , the risk to crash the production machines if optimizing production processes, or the risk to
loose … In Offline RL, we assume that a dataset P of trajectories is provided. A single trajectory

Offline reinforcement learning from images with latent space models

R Rafailov, T Yu, A Rajeswaran… - Learning for dynamics …, 2021 - proceedings.mlr.press
… , latent offline modelbased policy optimization (LOMPO), which enables … on offline RL in
high-dimensional POMDPs, where the agent has access to the fixed dataset Denv of trajectories, …

On Transforming Reinforcement Learning With Transformers: The Development Trajectory

S Hu, L Shen, Y Zhang, Y Chen… - … Analysis and Machine …, 2024 - ieeexplore.ieee.org
offline RL setting can make full use of the ability of deep networks to extract the optimal policy
from a large amount of offline … discrepancy between the offline training data and the target …

Plas: Latent action space for offline reinforcement learning

W Zhou, S Bajracharya, D Held - … on Robot Learning, 2021 - proceedings.mlr.press
… [19] samples action sequences from the CVAE when they perform trajectory optimization
with the learned latent dynamics model. Krupnik et al. [20] extended the previous method to …