[HTML][HTML] Few-shot remote sensing image scene classification: Recent advances, new baselines, and future trends

C Qiu, X Zhang, X Tong, N Guan, X Yi, K Yang… - ISPRS Journal of …, 2024 - Elsevier
Remote sensing image scene classification (RSI-SC) is crucial for various high-level
applications, including RSI retrieval, image captioning, and object detection. Deep learning …

Rsgpt: A remote sensing vision language model and benchmark

Y Hu, J Yuan, C Wen, X Lu, X Li - arXiv preprint arXiv:2307.15266, 2023 - arxiv.org
The emergence of large-scale large language models, with GPT-4 as a prominent example,
has significantly propelled the rapid advancement of artificial general intelligence and …

Advances and challenges in deep learning-based change detection for remote sensing images: A review through various learning paradigms

L Wang, M Zhang, X Gao, W Shi - Remote Sensing, 2024 - mdpi.com
Change detection (CD) in remote sensing (RS) imagery is a pivotal method for detecting
changes in the Earth's surface, finding wide applications in urban planning, disaster …

Remote sensing vision-language foundation models without annotations via ground remote alignment

U Mall, CP Phoo, MK Liu, C Vondrick… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce a method to train vision-language models for remote-sensing images without
using any textual annotations. Our key insight is to use co-located internet imagery taken on …

Bootstrapping interactive image-text alignment for remote sensing image captioning

C Yang, Z Li, L Zhang - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org
Recently, remote sensing image captioning (RSIC) has gained significant attention in the
remote sensing community. Due to the significant differences in spatial resolution of remote …

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

D Muhtar, Z Li, F Gu, X Zhang, P Xiao - arXiv preprint arXiv:2402.02544, 2024 - arxiv.org
The revolutionary capabilities of large language models (LLMs) have paved the way for
multimodal large language models (MLLMs) and fostered diverse applications across …

Patfig: Generating short and long captions for patent figures

D Aubakirova, K Gerdes, L Liu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract This paper introduces Qatent PatFig, a novel large-scale patent figure dataset
comprising 30,000+ patent figures from over 11,000 European patent applications. For each …

Large language models for captioning and retrieving remote sensing images

JD Silva, J Magalhães, D Tuia, B Martins - arXiv preprint arXiv:2402.06475, 2024 - arxiv.org
Image captioning and cross-modal retrieval are examples of tasks that involve the joint
analysis of visual and linguistic information. In connection to remote sensing imagery, these …

Towards an Exhaustive Evaluation of Vision-Language Foundation Models

E Salin, S Ayache, B Favre - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Vision-language foundation models have had considerable increase in performances in the
last few years. However, there is still a lack of comprehensive evaluation methods able to …

Brain-inspired remote sensing foundation models and open problems: A comprehensive survey

L Jiao, Z Huang, X Lu, X Liu, Y Yang… - IEEE Journal of …, 2023 - ieeexplore.ieee.org
The foundation model (FM) has garnered significant attention for its remarkable transfer
performance in downstream tasks. Typically, it undergoes task-agnostic pretraining on a …