A survey on offline reinforcement learning: Taxonomy, review, and open problems

S Jayanthi, L Chen, N Balabanska… - … on Robot Learning, 2023 - proceedings.mlr.press

Abstract Offline Learning from Demonstrations (OLfD) is valuable in domains where trial-and-
error learning is infeasible or specifying a cost function is difficult, such as robotic surgery …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Large language model adaptation for networking

D Wu, X Wang, Y Qiao, Z Wang, J Jiang, S Cui… - arXiv preprint arXiv …, 2024 - arxiv.org

Many networking tasks now employ deep learning (DL) to solve complex prediction and
system optimization problems. However, current design philosophy of DL-based algorithms …

被引用次数：5 相关文章所有 2 个版本

[PDF] openreview.net

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning

C Jia, C Gao, H Yin, F Zhang, XH Chen… - The Twelfth …, 2024 - openreview.net

Human beings can make adaptive decisions in a preparatory manner, ie, by making
preparations in advance, which offers significant advantages in scenarios where both online …

被引用次数：2 相关文章

[PDF] mlr.press

Revisiting bellman errors for offline model selection

JP Zitovsky, D De Marchi, R Agarwal… - International …, 2023 - proceedings.mlr.press

Offline model selection (OMS), that is, choosing the best policy from a set of many policies
given only logged data, is crucial for applying offline RL in real-world settings. One idea that …

被引用次数：6 相关文章所有 7 个版本

[PDF] arxiv.org

Federated and meta learning over non-wireless and wireless networks: A tutorial

X Liu, Y Deng, A Nallanathan, M Bennis - arXiv preprint arXiv:2210.13111, 2022 - arxiv.org

In recent years, various machine learning (ML) solutions have been developed to solve
resource management, interference management, autonomy, and decision-making …

被引用次数：9 相关文章

Mild policy evaluation for offline actor–critic

L Huang, B Dong, J Lu, W Zhang - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

In offline actor–critic (AC) algorithms, the distributional shift between the training data and
target policy causes optimistic value estimates for out-of-distribution (OOD) actions. This …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

A survey of demonstration learning

A Correia, LA Alexandre - arXiv preprint arXiv:2303.11191, 2023 - arxiv.org

With the fast improvement of machine learning, reinforcement learning (RL) has been used
to automate human tasks in different areas. However, training such agents is difficult and …

被引用次数：8 相关文章所有 3 个版本

[PDF] copernicus.org

Deep learning subgrid-scale parametrisations for short-term forecasting of sea-ice dynamics with a Maxwell elasto-brittle rheology

TS Finn, C Durand, A Farchi, M Bocquet, Y Chen… - The …, 2023 - tc.copernicus.org

We introduce a proof of concept to parametrise the unresolved subgrid scale of sea-ice
dynamics with deep learning techniques. Instead of parametrising single processes, a single …

被引用次数：10 相关文章所有 11 个版本

[PDF] arxiv.org

Learning to view: Decision transformers for active object detection

W Ding, N Majcherczyk, M Deshpande… - … on Robotics and …, 2023 - ieeexplore.ieee.org

Active perception describes a broad class of techniques that couple planning and
perception systems to move the robot in a way to give the robot more information about the …

被引用次数：8 相关文章所有 4 个版本

A survey of progress on cooperative multi-agent reinforcement learning in open environment

L Yuan, Z Zhang, L Li, C Guan, Y Yu - arXiv preprint arXiv:2312.01058, 2023 - arxiv.org

Multi-agent Reinforcement Learning (MARL) has gained wide attention in recent years and
has made progress in various fields. Specifically, cooperative MARL focuses on training a …

被引用次数：6 相关文章所有 2 个版本

高级搜索

QQ 群