Boosting offline reinforcement learning with action preference query

Y Pu, W Liang, Y Hao, Y Yuan… - Advances in …, 2024 - proceedings.neurips.cc

Modern detection transformers (DETRs) use a set of object queries to predict a list of
bounding boxes, sort them by their classification confidence scores, and select the top …

被引用次数：18 相关文章所有 5 个版本

[PDF] neurips.cc

Train once, get a family: State-adaptive balances for offline-to-online reinforcement learning

S Wang, Q Yang, J Gao, M Lin… - Advances in …, 2024 - proceedings.neurips.cc

Offline-to-online reinforcement learning (RL) is a training paradigm that combines pre-
training on a pre-collected dataset with fine-tuning in an online environment. However, the …

被引用次数：6 相关文章所有 6 个版本

[PDF] neurips.cc

Understanding, predicting and better resolving Q-value divergence in offline-RL

Y Yue, R Lu, B Kang, S Song… - Advances in Neural …, 2024 - proceedings.neurips.cc

The divergence of the Q-value estimation has been a prominent issue offline reinforcement
learning (offline RL), where the agent has no access to real dynamics. Traditional beliefs …

被引用次数：2 相关文章所有 5 个版本

[PDF] neurips.cc

Counterfactual-augmented importance sampling for semi-offline policy evaluation

S Tang, J Wiens - Advances in Neural Information …, 2023 - proceedings.neurips.cc

In applying reinforcement learning (RL) to high-stakes domains, quantitative and qualitative
evaluation using observational data can help practitioners understand the generalization …

Adaptive Advantage-Guided Policy Regularization for Offline Reinforcement Learning

T Liu, Y Li, Y Lan, H Gao, W Pan, X Xu - arXiv preprint arXiv:2405.19909, 2024 - arxiv.org

In offline reinforcement learning, the challenge of out-of-distribution (OOD) is pronounced.
To address this, existing methods often constrain the learned policy through policy …

高级搜索

QQ 群