Droid: Learning from offline heterogeneous demonstrations via reward-policy distillation

S Jayanthi, L Chen, N Balabanska… - … on Robot Learning, 2023 - proceedings.mlr.press
Abstract Offline Learning from Demonstrations (OLfD) is valuable in domains where trial-and-
error learning is infeasible or specifying a cost function is difficult, such as robotic surgery …

Large language model adaptation for networking

D Wu, X Wang, Y Qiao, Z Wang, J Jiang, S Cui… - arXiv preprint arXiv …, 2024 - arxiv.org
Many networking tasks now employ deep learning (DL) to solve complex prediction and
system optimization problems. However, current design philosophy of DL-based algorithms …

Policy Rehearsing: Training Generalizable Policies for Reinforcement Learning

C Jia, C Gao, H Yin, F Zhang, XH Chen… - The Twelfth …, 2024 - openreview.net
Human beings can make adaptive decisions in a preparatory manner, ie, by making
preparations in advance, which offers significant advantages in scenarios where both online …

Revisiting bellman errors for offline model selection

JP Zitovsky, D De Marchi, R Agarwal… - International …, 2023 - proceedings.mlr.press
Offline model selection (OMS), that is, choosing the best policy from a set of many policies
given only logged data, is crucial for applying offline RL in real-world settings. One idea that …

Federated and meta learning over non-wireless and wireless networks: A tutorial

X Liu, Y Deng, A Nallanathan, M Bennis - arXiv preprint arXiv:2210.13111, 2022 - arxiv.org
In recent years, various machine learning (ML) solutions have been developed to solve
resource management, interference management, autonomy, and decision-making …

Mild policy evaluation for offline actor–critic

L Huang, B Dong, J Lu, W Zhang - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In offline actor–critic (AC) algorithms, the distributional shift between the training data and
target policy causes optimistic value estimates for out-of-distribution (OOD) actions. This …

A survey of demonstration learning

A Correia, LA Alexandre - arXiv preprint arXiv:2303.11191, 2023 - arxiv.org
With the fast improvement of machine learning, reinforcement learning (RL) has been used
to automate human tasks in different areas. However, training such agents is difficult and …

Deep learning subgrid-scale parametrisations for short-term forecasting of sea-ice dynamics with a Maxwell elasto-brittle rheology

TS Finn, C Durand, A Farchi, M Bocquet, Y Chen… - The …, 2023 - tc.copernicus.org
We introduce a proof of concept to parametrise the unresolved subgrid scale of sea-ice
dynamics with deep learning techniques. Instead of parametrising single processes, a single …

Learning to view: Decision transformers for active object detection

W Ding, N Majcherczyk, M Deshpande… - … on Robotics and …, 2023 - ieeexplore.ieee.org
Active perception describes a broad class of techniques that couple planning and
perception systems to move the robot in a way to give the robot more information about the …

A survey of progress on cooperative multi-agent reinforcement learning in open environment

L Yuan, Z Zhang, L Li, C Guan, Y Yu - arXiv preprint arXiv:2312.01058, 2023 - arxiv.org
Multi-agent Reinforcement Learning (MARL) has gained wide attention in recent years and
has made progress in various fields. Specifically, cooperative MARL focuses on training a …