Abstract Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for computer vision tasks, while the self-attention computation in Transformer scales …
W Yu, C Si, P Zhou, M Luo, Y Zhou… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
MetaFormer, the abstracted architecture of Transformer, has been found to play a significant role in achieving competitive performance. In this paper, we further explore the capacity of …
Abstract Vision Transformer (ViT) has gained increasing attention in the computer vision community in recent years. However the core component of ViT Self-Attention lacks explicit …
Vision transformers have become popular as a possible substitute to convolutional neural networks (CNNs) for a variety of computer vision applications. These transformers, with their …
X Zhang, C Cen, F Li, M Liu, W Mu - Expert Systems with Applications, 2023 - Elsevier
In the smart agriculture community, automatic segmentation is an important basis for plant disease detection and identification. However, the complex background and texturally rich …
SA Liu, Y Zhang, Z Qiu, H Xie… - Proceedings of the …, 2023 - openaccess.thecvf.com
Generalized few-shot semantic segmentation (GFSS) distinguishes pixels of base and novel classes from the background simultaneously, conditioning on sufficient data of base classes …
Q Cai, Y Pan, T Yao, CW Ngo… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Recent progress on multi-modal 3D object detection has featured BEV (Bird-Eye-View) based fusion, which effectively unifies both LiDAR point clouds and camera images in a …
Transformers have demonstrated remarkable performance in natural language processing and computer vision. However, existing vision Transformers struggle to learn from limited …
Recent remarkable advances in large-scale text-to-image diffusion models have inspired a significant breakthrough in text-to-3D generation, pursuing 3D content creation solely from a …