Evaluating vision transformer methods for deep reinforcement learning from pixels

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org

Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

被引用次数：142 相关文章所有 3 个版本

[PDF] arxiv.org

Transformers in reinforcement learning: a survey

P Agarwal, AA Rahman, PL St-Charles… - arXiv preprint arXiv …, 2023 - arxiv.org

Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on masked autoencoder for self-supervised learning in vision and beyond

C Zhang, C Zhang, J Song, JSK Yi, K Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org

Masked autoencoders are scalable vision learners, as the title of MAE\cite {he2022masked},
which suggests that self-supervised learning (SSL) in vision might undertake a similar …

被引用次数：80 相关文章所有 2 个版本

[PDF] arxiv.org

A survey on transformers in reinforcement learning

W Li, H Luo, Z Lin, C Zhang, Z Lu, D Ye - arXiv preprint arXiv:2301.03044, 2023 - arxiv.org

Transformer has been considered the dominating neural architecture in NLP and CV, mostly
under supervised settings. Recently, a similar surge of using Transformers has appeared in …

被引用次数：69 相关文章所有 3 个版本

[PDF] arxiv.org

Transformer in transformer as backbone for deep reinforcement learning

H Mao, R Zhao, H Chen, J Hao, Y Chen, D Li… - arXiv preprint arXiv …, 2022 - arxiv.org

Designing better deep networks and better reinforcement learning (RL) algorithms are both
important for deep RL. This work focuses on the former. Previous methods build the network …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Vision transformers for end-to-end vision-based quadrotor obstacle avoidance

A Bhattacharya, N Rao, D Parikh, P Kunapuli… - arXiv preprint arXiv …, 2024 - arxiv.org

We demonstrate the capabilities of an attention-based end-to-end approach for high-speed
vision-based quadrotor obstacle avoidance in dense, cluttered environments, with …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning

B Grooten, T Tomilin, G Vasan, ME Taylor… - arXiv preprint arXiv …, 2023 - arxiv.org

The visual world provides an abundance of information, but many input pixels received by
agents often contain distracting stimuli. Autonomous agents need the ability to distinguish …

被引用次数：6 相关文章所有 5 个版本

[PDF] iospress.nl

[PDF][PDF] Pretraining the Vision Transformer Using Self-Supervised Methods for Vision Based Deep Reinforcement Learning.

M Goulão, AL Oliveira - ECAI, 2023 - ebooks.iospress.nl

The Vision Transformer architecture has shown to be competitive in the computer vision
(CV) space where it has dethroned convolution-based networks in several benchmarks …

被引用次数：6 相关文章所有 7 个版本

[PDF] acm.org

Deep reinforcement learning with swin transformers

L Meng, M Goodwin, A Yazidi, P Engelstad - Proceedings of the 2024 …, 2024 - dl.acm.org

Transformers are neural network models that utilize multiple layers of self-attention heads
and have exhibited enormous potential in natural language processing tasks. Meanwhile …

被引用次数：7 相关文章所有 2 个版本

[PDF] mdpi.com

Vision-based efficient robotic manipulation with a dual-streaming compact convolutional transformer

H Guo, M Song, Z Ding, C Yi, F Jiang - Sensors, 2023 - mdpi.com

Learning from visual observation for efficient robotic manipulation is a hitherto significant
challenge in Reinforcement Learning (RL). Although the collocation of RL policies and …

被引用次数：2 相关文章所有 8 个版本

高级搜索

QQ 群