[HTML][HTML] Coarse-to-fine video instance segmentation with factorized conditional appearance flows

Z Qin, X Lu, X Nie, D Liu, Y Yin, W Wang - IEEE/CAA Journal of …, 2023 - ieee-jas.net
We introduce a novel method using a new generative model that automatically learns
effective representations of the target and background appearance to detect, segment and …

Transflow: Transformer as flow learner

Y Lu, Q Wang, S Ma, T Geng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Optical flow is an indispensable building block for various important computer vision tasks,
including motion estimation, object tracking, and disparity measurement. In this work, we …

Clustseg: Clustering for universal segmentation

J Liang, T Zhou, D Liu, W Wang - arXiv preprint arXiv:2305.02187, 2023 - arxiv.org
We present CLUSTSEG, a general, transformer-based framework that tackles different
image segmentation tasks (ie, superpixel, semantic, instance, and panoptic) through a …

Clusterfomer: clustering as a universal visual learner

J Liang, Y Cui, Q Wang, T Geng… - Advances in neural …, 2024 - proceedings.neurips.cc
This paper presents ClusterFormer, a universal vision model that is based on the Clustering
paradigm with TransFormer. It comprises two novel designs: 1) recurrent cross-attention …

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - arXiv preprint arXiv …, 2023 - arxiv.org
Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

Differential feature awareness network within antagonistic learning for infrared-visible object detection

R Zhang, L Li, Q Zhang, J Zhang, L Xu… - … on Circuits and …, 2023 - ieeexplore.ieee.org
The combination of infrared and visible videos aims to gather more comprehensive feature
information from multiple sources and reach superior results on various practical tasks, such …

Unified 3d segmenter as prototypical classifiers

Z Qin, C Han, Q Wang, X Nie, Y Yin… - Advances in Neural …, 2023 - proceedings.neurips.cc
The task of point cloud segmentation, comprising semantic, instance, and panoptic
segmentation, has been mainly tackled by designing task-specific network architectures …

E^ 2VPT: An Effective and Efficient Approach for Visual Prompt Tuning

C Han, Q Wang, Y Cui, Z Cao, W Wang, S Qi… - arXiv preprint arXiv …, 2023 - arxiv.org
As the size of transformer-based models continues to grow, fine-tuning these large-scale
pretrained vision models for new tasks has become increasingly parameter-intensive …

Large-scale person detection and localization using overhead fisheye cameras

L Yang, L Li, X Xin, Y Sun, Q Song… - Proceedings of the …, 2023 - openaccess.thecvf.com
Location determination finds wide applications in daily life. Instead of existing efforts devoted
to localizing tourist photos captured by perspective cameras, in this article, we focus on …

[PDF][PDF] Prompt Learns Prompt: Exploring Knowledge-Aware Generative Prompt Collaboration For Video Captioning.

L Yan, C Han, Z Xu, D Liu, Q Wang - IJCAI, 2023 - ijcai.org
Fine-tuning large vision-language models is a challenging task. Prompt tuning approaches
have been introduced to learn fixed textual or visual prompts while freezing the pre-trained …