Transformers meet visual learning understanding: A comprehensive review

Y Yang, L Jiao, X Liu, F Liu, S Yang, Z Feng… - arXiv preprint arXiv …, 2022 - arxiv.org
Dynamic attention mechanism and global modeling ability make Transformer show strong
feature learning ability. In recent years, Transformer has become comparable to CNNs …

EAPT: efficient attention pyramid transformer for image processing

X Lin, S Sun, W Huang, B Sheng, P Li… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Recent transformer-based models, especially patch-based methods, have shown huge
potentiality in vision tasks. However, the split fixed-size patches divide the input features into …

M-FFN: multi-scale feature fusion network for image captioning

J Prudviraj, C Vishnu, CK Mohan - Applied Intelligence, 2022 - Springer
In this work, we present a novel multi-scale feature fusion network (M-FFN) for image
captioning task to incorporate discriminative features and scene contextual information of an …

From handcrafted to deep features for pedestrian detection: A survey

J Cao, Y Pang, J Xie, FS Khan… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Pedestrian detection is an important but challenging problem in computer vision, especially
in human-centric tasks. Over the past decade, significant improvement has been witnessed …

Convformer-NSE: A novel end-to-end gearbox fault diagnosis framework under heavy noise using joint global and local information

S Han, H Shao, J Cheng, X Yang… - IEEE/ASME Transactions …, 2022 - ieeexplore.ieee.org
The application of convolutional neural network (CNN) has greatly promoted the scope and
scenario of intelligent fault diagnosis and brought about a significant improvement of …

Attention-guided context feature pyramid network for object detection

J Cao, Q Chen, J Guo, R Shi - arXiv preprint arXiv:2005.11475, 2020 - arxiv.org
For object detection, how to address the contradictory requirement between feature map
resolution and receptive field on high-resolution inputs still remains an open question. In this …

Robust appearance modeling for object detection and tracking: a survey of deep learning approaches

A Mumuni, F Mumuni - Progress in Artificial Intelligence, 2022 - Springer
The task of object detection and tracking is one of the most complex and challenging
problems in artificial intelligence (AI) systems that model perception. Object tracking has …

Center and scale prediction: Anchor-free approach for pedestrian and face detection

W Liu, I Hasan, S Liao - Pattern Recognition, 2023 - Elsevier
Object detection traditionally requires sliding-window classifier in modern deep learning
based approaches. However, both of these approaches requires tedious configurations in …

Universal semantic segmentation for fisheye urban driving images

Y Ye, K Yang, K Xiang, J Wang… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Semantic segmentation is a critical method in the field of autonomous driving. When
performing semantic image segmentation, a wider field of view (FoV) helps to obtain more …

Efficient pedestrian detection in top-view fisheye images using compositions of perspective view patches

SH Chiang, T Wang, YF Chen - Image and Vision Computing, 2021 - Elsevier
Pedestrian detection in images is a topic that has been studied extensively, but existing
detectors designed for perspective images do not perform as successfully on images taken …