Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org
Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

Transformers in reinforcement learning: a survey

P Agarwal, AA Rahman, PL St-Charles… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …

A survey on masked autoencoder for self-supervised learning in vision and beyond

C Zhang, C Zhang, J Song, JSK Yi, K Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
Masked autoencoders are scalable vision learners, as the title of MAE\cite {he2022masked},
which suggests that self-supervised learning (SSL) in vision might undertake a similar …

A survey on transformers in reinforcement learning

W Li, H Luo, Z Lin, C Zhang, Z Lu, D Ye - arXiv preprint arXiv:2301.03044, 2023 - arxiv.org
Transformer has been considered the dominating neural architecture in NLP and CV, mostly
under supervised settings. Recently, a similar surge of using Transformers has appeared in …

Transformer in transformer as backbone for deep reinforcement learning

H Mao, R Zhao, H Chen, J Hao, Y Chen, D Li… - arXiv preprint arXiv …, 2022 - arxiv.org
Designing better deep networks and better reinforcement learning (RL) algorithms are both
important for deep RL. This work focuses on the former. Previous methods build the network …

Vision transformers for end-to-end vision-based quadrotor obstacle avoidance

A Bhattacharya, N Rao, D Parikh, P Kunapuli… - arXiv preprint arXiv …, 2024 - arxiv.org
We demonstrate the capabilities of an attention-based end-to-end approach for high-speed
vision-based quadrotor obstacle avoidance in dense, cluttered environments, with …

MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning

B Grooten, T Tomilin, G Vasan, ME Taylor… - arXiv preprint arXiv …, 2023 - arxiv.org
The visual world provides an abundance of information, but many input pixels received by
agents often contain distracting stimuli. Autonomous agents need the ability to distinguish …

[PDF][PDF] Pretraining the Vision Transformer Using Self-Supervised Methods for Vision Based Deep Reinforcement Learning.

M Goulão, AL Oliveira - ECAI, 2023 - ebooks.iospress.nl
The Vision Transformer architecture has shown to be competitive in the computer vision
(CV) space where it has dethroned convolution-based networks in several benchmarks …

Deep reinforcement learning with swin transformers

L Meng, M Goodwin, A Yazidi, P Engelstad - Proceedings of the 2024 …, 2024 - dl.acm.org
Transformers are neural network models that utilize multiple layers of self-attention heads
and have exhibited enormous potential in natural language processing tasks. Meanwhile …

Vision-based efficient robotic manipulation with a dual-streaming compact convolutional transformer

H Guo, M Song, Z Ding, C Yi, F Jiang - Sensors, 2023 - mdpi.com
Learning from visual observation for efficient robotic manipulation is a hitherto significant
challenge in Reinforcement Learning (RL). Although the collocation of RL policies and …