Large-scale vision foundation models have made significant progress in visual tasks on natural images, with vision transformers (ViTs) being the primary choice due to their good …
Deep learning has largely reshaped remote sensing (RS) research for aerial image understanding and made a great success. Nevertheless, most of the existing deep models …
Human civilization has an increasingly powerful influence on the earth system, and earth observations are an invaluable tool for assessing and mitigating the negative impacts. To …
Self-supervised learning (SSL) has gained wide-spread attention in the remote sensing (RS) and Earth observation (EO) communities owing to its ability to learn task-agnostic …
In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in …
Self-supervised representation learning (SSL) typically suffers from inadequate data utilization and feature-specificity due to the suboptimal sampling strategy and the …
In this paper, we show the surprisingly good properties of plain vision transformers for body pose estimation from various aspects, namely simplicity in model structure, scalability in …
Existing methods for arbitrary-shaped text detection in natural scenes face two critical issues, ie,(1) fracture detections at the gaps in a text instance; and (2) inaccurate detections …
Self-supervised visual representation learning (SSL) aims to extract the most distinctive features from unlabeled datasets to overcome challenges of labor-intensive and time …