S Cong, Y Zhou - Artificial Intelligence Review, 2023 - Springer
The research advances concerning the typical architectures of convolutional neural networks (CNNs) as well as their optimizations are analyzed and elaborated in detail in this …
This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike recent visual transformers that introduce vision-specific inductive biases into their …
X Pan, C Ge, R Lu, S Song, G Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Convolution and self-attention are two powerful techniques for representation learning, and they are usually considered as two peer approaches that are distinct from each other. In this …
Abstract We present Mobile-Former, a parallel design of MobileNet and transformer with a two-way bridge in between. This structure leverages the advantages of MobileNet at local …
Vision transformers have shown great potential in various computer vision tasks owing to their strong capability to model long-range dependency using the self-attention mechanism …
Transformers have shown great potential in various computer vision tasks owing to their strong capability in modeling long-range dependency using the self-attention mechanism …
Abstract We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that …
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that …
Q Chen, Q Wu, J Wang, Q Hu, T Hu… - Proceedings of the …, 2022 - openaccess.thecvf.com
While local-window self-attention performs notably in vision tasks, it suffers from limited receptive field and weak modeling capability issues. This is mainly because it performs self …