查看文章

Pyramid masked image modeling for transformer-based aerial object detection

作者

Cong Zhang, Tianshan Liu, Yakun Ju, Kin-Man Lam

发表日期

2023/10/8

研讨会论文

2023 IEEE International Conference on Image Processing (ICIP)

页码范围

1675-1679

出版商

IEEE

简介

Two obstacles, the scarcity of annotated samples and the difficulty in preserving multi-scale hierarchical representations, hinder the advancement of vision Transformer-based aerial object detection. The emergence of self-supervised learning has inspired some solutions to the first issue. However, most solutions focus on single-scale features, conflicting with solving the second issue. To bridge this gap, this paper proposes a novel pyramid masked image modeling (MIM) framework, termed PyraMIM, for self-supervised pretraining in aerial scenarios. Without manual annotation, PyraMIM enables establishing pyramid representations during pretraining, which can be seamlessly adapted to downstream aerial object detection for performance improvement. Experimental results demonstrate the effectiveness and superiority of our method.

引用总数

被引用次数：4

202320242 2

学术搜索中的文章

Pyramid masked image modeling for transformer-based aerial object detection

C Zhang, T Liu, Y Ju, KM Lam - 2023 IEEE International Conference on Image …, 2023

被引用次数：4 相关文章所有 2 个版本