M Janner, Q Li, S Levine - Advances in neural information …, 2021 - proceedings.neurips.cc
… When the goal is to reproduce the distribution of trajectories in the training data, we can optimize directly for the probability of a trajectory τ. This situation matches the goal of sequence …
… offlinetrajectory data, we investigate the impact of data sampling processes on offline RL algorithms from a trajectory … In this section, we evaluate PTR, which optimizes the trajectory …
… optimal and suboptimal trajectories without predefined returns, often resulting in suboptimal policies that mirror the distribution of the training data. To overcome the limitations of IL, …
… over the average return of trajectories in the dataset. We … offline RL algorithms of staying close to the trajectories in the dataset. If the dataset primarily consists of sub-optimaltrajectories, …
Q Lin, B Tang, Z Wu, C Yu, S Mao… - … Machine Learning, 2023 - proceedings.mlr.press
… To model the optimaltrajectory distribution wrt a certain … To obtain the optimaltrajectory distribution in Theorem 4.1, … 2022), a recently proposed trajectoryoptimization framework that …
… In this paper, we study an offline RL setup for learning from heterogeneous datasets where trajectories are collected using policies with different purposes, leading to a multi-modal data …
… We examine domains that contain near-optimaltrajectories, where single-step methods perform well, as well as domains with no optimaltrajectories at all, which require multi-step …
… Another way to optimize the reinforcementlearning objective … to then recover a near-optimal policy. A value function provides … sample new trajectories from πβ, while old trajectories are …
… ] and trajectoryoptimization [27… with learning an optimal policy and an optimaltrajectory distribution, respectively. Currently, a limited number of works have reviewed the field of offline …