An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

DK Nguyen, M Assran, U Jain, MR Oswald… - arXiv preprint arXiv …, 2024 - arxiv.org
This work does not introduce a new method. Instead, we present an interesting finding that
questions the necessity of the inductive bias--locality in modern computer vision …

Multi-grained contrast for data-efficient unsupervised representation learning

C Shen, J Chen, J Wang - arXiv preprint arXiv:2407.02014, 2024 - arxiv.org
The existing contrastive learning methods mainly focus on single-grained representation
learning, eg, part-level, object-level or scene-level ones, thus inevitably neglecting the …

SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation

Y Yuan, H Dou, F Guo, X Li - arXiv preprint arXiv:2406.10673, 2024 - arxiv.org
This paper represents a neat yet effective framework, named SemanticMIM, to integrate the
advantages of masked image modeling (MIM) and contrastive learning (CL) for general …

Contrastive Representation Learning With Mixture-of-Instance-and-Pixel

D Cheng, J Yin - … Journal of Information Technologies and Systems …, 2024 - igi-global.com
Contrastive learning has remarkable transfer learning capabilities. But many current
methods are pretrained based on instance-level or pixel-level pretext tasks, resulting in …