Yolo-firi: Improved yolov5 for infrared image object detection

S Li, Y Li, Y Li, M Li, X Xu - IEEE access, 2021 - ieeexplore.ieee.org
To solve object detection issues in infrared images, such as a low recognition rate and a
high false alarm rate caused by long distances, weak energy, and low resolution, we …

Battle of the backbones: A large-scale comparison of pretrained models across computer vision tasks

M Goldblum, H Souri, R Ni, M Shu… - Advances in …, 2024 - proceedings.neurips.cc
Neural network based computer vision systems are typically built on a backbone, a
pretrained or randomly initialized feature extractor. Several years ago, the default option was …

More convnets in the 2020s: Scaling up kernels beyond 51x51 using sparsity

S Liu, T Chen, X Chen, X Chen, Q Xiao, B Wu… - arXiv preprint arXiv …, 2022 - arxiv.org
Transformers have quickly shined in the computer vision world since the emergence of
Vision Transformers (ViTs). The dominant role of convolutional neural networks (CNNs) …

Visformer: The vision-friendly transformer

Z Chen, L Xie, J Niu, X Liu, L Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com
The past year has witnessed the rapid development of applying the Transformer module to
vision problems. While some researchers have demonstrated that Transformer-based …

Scale-aware modulation meet transformer

W Lin, Z Wu, J Chen, J Huang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper presents a new vision Transformer, Scale Aware Modulation Transformer (SMT),
that can handle various downstream tasks efficiently by combining the convolutional network …

Swintextspotter: Scene text spotting via better synergy between text detection and text recognition

M Huang, Y Liu, Z Peng, C Liu, D Lin… - proceedings of the …, 2022 - openaccess.thecvf.com
End-to-end scene text spotting has attracted great attention in recent years due to the
success of excavating the intrinsic synergy of the scene text detection and recognition …

Group detr: Fast detr training with group-wise one-to-many assignment

Q Chen, X Chen, J Wang, S Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth
object to one prediction, for end-to-end detection without NMS post-processing. It is known …

Boosting R-CNN: Reweighting R-CNN samples by RPN's error for underwater object detection

P Song, P Li, L Dai, T Wang, Z Chen - Neurocomputing, 2023 - Elsevier
Complicated underwater environments bring new challenges to object detection, such as
unbalanced light conditions, low contrast, occlusion, and mimicry of aquatic organisms …

EAPT: efficient attention pyramid transformer for image processing

X Lin, S Sun, W Huang, B Sheng, P Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recent transformer-based models, especially patch-based methods, have shown huge
potentiality in vision tasks. However, the split fixed-size patches divide the input features into …

Efficient training of visual transformers with small datasets

Y Liu, E Sangineto, W Bi, N Sebe… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Visual Transformers (VTs) are emerging as an architectural paradigm alternative to
Convolutional networks (CNNs). Differently from CNNs, VTs can capture global relations …