[HTML][HTML] Optimizing trajectories for highway driving with offline reinforcement learning

B Mirchevska, M Werling, J Boedecker - Frontiers in Future …, 2023 - frontiersin.org
… propose a Reinforcement Learning-based approach, which learns target trajectory parameters
for fully autonomous driving on highways. The trained agent outputs continuous trajectory

Offline reinforcement learning as one big sequence modeling problem

M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc
… When the goal is to reproduce the distribution of trajectories in the training data, we can
optimize directly for the probability of a trajectory τ. This situation matches the goal of sequence …

[PDF][PDF] A Trajectory Perspective on the Role of Data Sampling Techniques in Offline Reinforcement Learning

J Liu, Y Ma, J Hao, Y Hu, Y Zheng, T Lv… - Proceedings of the 23rd …, 2024 - ifaamas.org
offline trajectory data, we investigate the impact of data sampling processes on offline RL
algorithms from a trajectory … In this section, we evaluate PTR, which optimizes the trajectory

Critic-guided decision transformer for offline reinforcement learning

Y Wang, C Yang, Y Wen, Y Liu, Y Qiao - Proceedings of the AAAI …, 2024 - ojs.aaai.org
optimal and suboptimal trajectories without predefined returns, often resulting in suboptimal
policies that mirror the distribution of the training data. To overcome the limitations of IL, …

Beyond uniform sampling: Offline reinforcement learning with imbalanced datasets

ZW Hong, A Kumar, S Karnik… - Advances in …, 2023 - proceedings.neurips.cc
… over the average return of trajectories in the dataset. We … offline RL algorithms of staying
close to the trajectories in the dataset. If the dataset primarily consists of sub-optimal trajectories, …

Safe offline reinforcement learning with real-time budget constraints

Q Lin, B Tang, Z Wu, C Yu, S Mao… - … Machine Learning, 2023 - proceedings.mlr.press
… To model the optimal trajectory distribution wrt a certain … To obtain the optimal trajectory
distribution in Theorem 4.1, … 2022), a recently proposed trajectory optimization framework that …

Lapo: Latent-variable advantage-weighted policy optimization for offline reinforcement learning

X Chen, A Ghadirzadeh, T Yu, J Wang… - Advances in …, 2022 - proceedings.neurips.cc
… In this paper, we study an offline RL setup for learning from heterogeneous datasets where
trajectories are collected using policies with different purposes, leading to a multi-modal data …

Offline reinforcement learning with implicit q-learning

I Kostrikov, A Nair, S Levine - arXiv preprint arXiv:2110.06169, 2021 - arxiv.org
… We examine domains that contain near-optimal trajectories, where single-step methods
perform well, as well as domains with no optimal trajectories at all, which require multi-step …

Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - arXiv preprint arXiv:2005.01643, 2020 - arxiv.org
… Another way to optimize the reinforcement learning objective … to then recover a near-optimal
policy. A value function provides … sample new trajectories from πβ, while old trajectories are …

A survey on offline reinforcement learning: Taxonomy, review, and open problems

RF Prudencio, MROA Maximo… - … Networks and Learning …, 2023 - ieeexplore.ieee.org
… ] and trajectory optimization [27… with learning an optimal policy and an optimal trajectory
distribution, respectively. Currently, a limited number of works have reviewed the field of offline