Sparse r-cnn: End-to-end object detection with learnable proposals

S Li, Y Li, Y Li, M Li, X Xu - IEEE access, 2021 - ieeexplore.ieee.org

To solve object detection issues in infrared images, such as a low recognition rate and a
high false alarm rate caused by long distances, weak energy, and low resolution, we …

被引用次数：163 相关文章所有 4 个版本

[PDF] neurips.cc

Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks

M Goldblum, H Souri, R Ni, M Shu… - Advances in …, 2024 - proceedings.neurips.cc

Neural network based computer vision systems are typically built on a backbone, a
pretrained or randomly initialized feature extractor. Several years ago, the default option was …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity

S Liu, T Chen, X Chen, X Chen, Q Xiao, B Wu… - arXiv preprint arXiv …, 2022 - arxiv.org

Transformers have quickly shined in the computer vision world since the emergence of
Vision Transformers (ViTs). The dominant role of convolutional neural networks (CNNs) …

被引用次数：136 相关文章所有 12 个版本

[PDF] thecvf.com

Visformer: The vision-friendly transformer

Z Chen, L Xie, J Niu, X Liu, L Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com

The past year has witnessed the rapid development of applying the Transformer module to
vision problems. While some researchers have demonstrated that Transformer-based …

被引用次数：199 相关文章所有 6 个版本

[PDF] thecvf.com

Scale-aware modulation meet transformer

W Lin, Z Wu, J Chen, J Huang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper presents a new vision Transformer, Scale Aware Modulation Transformer (SMT),
that can handle various downstream tasks efficiently by combining the convolutional network …

被引用次数：27 相关文章所有 5 个版本

[PDF] thecvf.com

Swintextspotter: Scene text spotting via better synergy between text detection and text recognition

M Huang, Y Liu, Z Peng, C Liu, D Lin… - proceedings of the …, 2022 - openaccess.thecvf.com

End-to-end scene text spotting has attracted great attention in recent years due to the
success of excavating the intrinsic synergy of the scene text detection and recognition …

被引用次数：93 相关文章所有 6 个版本

[PDF] thecvf.com

Group detr: Fast detr training with group-wise one-to-many assignment

Q Chen, X Chen, J Wang, S Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth
object to one prediction, for end-to-end detection without NMS post-processing. It is known …

被引用次数：61 相关文章所有 7 个版本

Boosting R-CNN: Reweighting R-CNN samples by RPN's error for underwater object detection

P Song, P Li, L Dai, T Wang, Z Chen - Neurocomputing, 2023 - Elsevier

Complicated underwater environments bring new challenges to object detection, such as
unbalanced light conditions, low contrast, occlusion, and mimicry of aquatic organisms …

被引用次数：94 相关文章所有 6 个版本

EAPT: efficient attention pyramid transformer for image processing

X Lin, S Sun, W Huang, B Sheng, P Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org

Recent transformer-based models, especially patch-based methods, have shown huge
potentiality in vision tasks. However, the split fixed-size patches divide the input features into …

被引用次数：140 相关文章所有 2 个版本

[PDF] neurips.cc

Efficient training of visual transformers with small datasets

Y Liu, E Sangineto, W Bi, N Sebe… - Advances in Neural …, 2021 - proceedings.neurips.cc

Abstract Visual Transformers (VTs) are emerging as an architectural paradigm alternative to
Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations …

被引用次数：168 相关文章所有 13 个版本

高级搜索

QQ 群