Fashionformer: A simple, effective and unified baseline for human fashion segmentation and...

G Cheng, Y Huang, X Li, S Lyu, Z Xu, H Zhao, Q Zhao… - Remote Sensing, 2024 - mdpi.com

Change detection is an essential and widely utilized task in remote sensing that aims to
detect and analyze changes occurring in the same geographical area over time, which has …

被引用次数：61 相关文章所有 2 个版本

[PDF] ieee.org

Transformer-based visual segmentation: A survey

X Li, H Ding, H Yuan, W Zhang, J Pang… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Visual segmentation seeks to partition images, video frames, or point clouds into multiple
segments or groups. This technique has numerous real-world applications, such as …

被引用次数：115 相关文章所有 3 个版本

[PDF] ieee.org

Towards open vocabulary learning: A survey

J Wu, X Li, S Xu, H Yuan, H Ding… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

In the field of visual scene understanding, deep neural networks have made impressive
advancements in various core tasks like segmentation, tracking, and detection. However …

被引用次数：124 相关文章所有 10 个版本

[PDF] thecvf.com

Unihcp: A unified model for human-centric perceptions

Y Ci, Y Wang, M Chen, S Tang, L Bai… - Proceedings of the …, 2023 - openaccess.thecvf.com

Human-centric perceptions (eg, pose estimation, human parsing, pedestrian detection,
person re-identification, etc.) play a key role in industrial applications of visual models. While …

被引用次数：60 相关文章所有 5 个版本

[PDF] arxiv.org

TransVOD: end-to-end video object detection with spatial-temporal transformers

Q Zhou, X Li, L He, Y Yang, G Cheng… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

Detection Transformer (DETR) and Deformable DETR have been proposed to eliminate the
need for many hand-designed components in object detection while demonstrating good …

被引用次数：150 相关文章所有 8 个版本

[PDF] thecvf.com

Betrayed by captions: Joint caption grounding and generation for open vocabulary instance segmentation

J Wu, X Li, H Ding, X Li, G Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this work, we focus on open vocabulary instance segmentation to expand a segmentation
model to classify and segment instance-level novel categories. Previous approaches have …

被引用次数：30 相关文章所有 12 个版本

[PDF] thecvf.com

Skeleton-in-context: Unified skeleton sequence modeling with in-context learning

X Wang, Z Fang, X Li, X Li… - Proceedings of the …, 2024 - openaccess.thecvf.com

In-context learning provides a new perspective for multi-task modeling for vision and NLP.
Under this setting the model can perceive tasks from prompts and accomplish them without …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Towards robust referring image segmentation

J Wu, X Li, X Li, H Ding, Y Tong… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Referring Image Segmentation (RIS) is a fundamental vision-language task that outputs
object masks based on text descriptions. Many works have achieved considerable progress …

被引用次数：47 相关文章所有 9 个版本

[PDF] arxiv.org

Multi-task learning with multi-query transformer for dense prediction

Y Xu, X Li, H Yuan, Y Yang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Previous multi-task dense prediction studies developed complex pipelines such as multi-
modal distillations in multiple stages or searching for task relational contexts for each task …

被引用次数：49 相关文章所有 5 个版本

[PDF] thecvf.com

Iterative robust visual grounding with masked reference based centerpoint supervision

M Li, C Wang, W Feng, S Lyu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Visual Grounding (VG) aims at localizing target objects from an image based on given
expressions and has made significant progress with the development of detection and vision …

被引用次数：6 相关文章所有 6 个版本

高级搜索

QQ 群