过去一年中添加的文章,按日期排序

Offline Imitation Learning with Model-based Reverse Augmentation

JJ Shao, HS Shi, LZ Guo, YF Li - arXiv preprint arXiv:2406.12550, 2024 - arxiv.org
4 天前 - … the conservatism into rollout trajectories in the offline reinforcement learning
context. The … We also consider offline supplementary from sub-optimal policy (hopper-medium-v2, …

Do Robots Dream of Random Trees? Monte Carlo Tree Search for Dynamical, Partially Observable, and Multi-Agent Systems

BP Rivière - 2024 - thesis.library.caltech.edu
16 天前 - … methods that train parameterized policies offline from data have shown recent
success, … compute trajectories in real-time while converging towards globally optimal solutions. …

Confidence Aware Inverse Constrained Reinforcement Learning

SG Subramanian, G Liu, M Elmahgiubi… - … on Machine Learning - openreview.net
17 天前 - … these constraints to learn the correct optimal policy in … expert demonstrations collected
offline. Practitioners prefer to … of expert trajectories is insufficient to learn a constraint with …

Information-Directed Pessimism for Offline Reinforcement Learning

A Koppel, S Bhatt, J Guo, J Eappen, M Wang… - … on Machine Learning - openreview.net
17 天前 - … (2.5)] under an arbitrary policy, and in particular the one associated with optimal
trajectories pπ⋆ (τ), since π⋆ may require visitation to states that are not contained in the offline

Self-Modifying State Modeling for Simultaneous Machine Translation

D Yu, X Kang, Y Liu, Y Zhou, C Zong - arXiv preprint arXiv:2406.02237, 2024 - arxiv.org
18 天前 - … Furthermore, SM2 allows offline machineoptimizes decisions at each state.
Although our experiments show the superiority of not building decision paths during training, there …

Maximum-Entropy Regularized Decision Transformer with Reward Relabelling for Dynamic Recommendation

X Chen, S Wang, L Yao - arXiv preprint arXiv:2406.00725, 2024 - arxiv.org
21 天前 - offline reinforcement learning methods, notable for their data-driven approach
utilizing offline … Additionally, to augment the model’s capability to stitch sub-optimal trajectories, …

Offline Regularised Reinforcement Learning for Large Language Models Alignment

PH Richemond, Y Tang, D Guo, D Calandriello… - arXiv preprint arXiv …, 2024 - arxiv.org
24 天前 - … In order to exploit this single trajectory setting, we introduce Direct Reward … In this
regard, our optimisation is performed like in offline reinforcement learning, where taking new …

HGRL: Human-Driving-Data Guided Reinforcement Learning for Autonomous Driving

H Zhuang, H Chu, Y Wang, B Gao… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
25 天前 - … an offline dataset for preference learning by comparing human driving trajectories
with generated feasible trajectories. … 2) Reinforcement Learning: RL aims to find the optimal

Trajectory Data Suffices for Statistically Efficient Learning in Offline RL with Linear -Realizability and Concentrability

V Tkachuk, G Weisz, C Szepesvári - arXiv preprint arXiv:2405.16809, 2024 - arxiv.org
26 天前 - … We study the offline reinforcement learning (RL) setting, where the objective is to
derive a nearoptimal policy for an H-horizon Markov decision process (MDP) using offline data…

Q-value regularized transformer for offline reinforcement learning

S Hu, Z Fan, C Huang, L Shen, Y Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
26 天前 - … on history trajectory and target … optimal trajectories from suboptimal ones due to
the inconsistency between the sampled returns within individual trajectories and the optimal