A review on 2D instance segmentation based on deep neural networks

W Gu, S Bai, L Kong - Image and Vision Computing, 2022 - Elsevier
Image instance segmentation involves labeling pixels of images with classes and instances,
which is one of the pivotal technologies in many domains, such as natural scenes …

Making images real again: A comprehensive survey on deep image composition

L Niu, W Cong, L Liu, Y Hong, B Zhang, J Liang… - arXiv preprint arXiv …, 2021 - arxiv.org
As a common image editing operation, image composition aims to combine the foreground
from one image and another background image, resulting in a composite image. However …

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arXiv preprint arXiv …, 2022 - arxiv.org
This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

Maxvit: Multi-axis vision transformer

Z Tu, H Talebi, H Zhang, F Yang, P Milanfar… - European conference on …, 2022 - Springer
Transformers have recently gained significant attention in the computer vision community.
However, the lack of scalability of self-attention mechanisms with respect to image size has …

Bidirectional copy-paste for semi-supervised medical image segmentation

Y Bai, D Chen, Q Li, W Shen… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In semi-supervised medical image segmentation, there exist empirical mismatch problems
between labeled and unlabeled data distribution. The knowledge learned from the labeled …

Swin transformer: Hierarchical vision transformer using shifted windows

Z Liu, Y Lin, Y Cao, H Hu, Y Wei… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents a new vision Transformer, called Swin Transformer, that capably serves
as a general-purpose backbone for computer vision. Challenges in adapting Transformer …

Focal self-attention for local-global interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao, L Yuan… - arXiv preprint arXiv …, 2021 - arxiv.org
Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability of capturing short-and long-range visual dependencies …

Simple copy-paste is a strong data augmentation method for instance segmentation

G Ghiasi, Y Cui, A Srinivas, R Qian… - Proceedings of the …, 2021 - openaccess.thecvf.com
Building instance segmentation models that are data-efficient and can handle rare object
categories is an important challenge in computer vision. Leveraging data augmentations is a …

Improved YOLOv5 network for real-time multi-scale traffic sign detection

J Wang, Y Chen, Z Dong, M Gao - Neural Computing and Applications, 2023 - Springer
Traffic sign detection is a challenging task for the unmanned driving system, especially for
the detection of multi-scale targets and the real-time problem of detection. In the traffic sign …

Focal attention for long-range interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability to capture local and global visual dependencies through …