Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

W Zhang, M Cai, T Zhang, Y Zhuang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Multimodal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …

Rs-llava: A large vision-language model for joint captioning and question answering in remote sensing imagery

Y Bazi, L Bashmal, MM Al Rahhal, R Ricci, F Melgani - Remote Sensing, 2024 - mdpi.com
In this paper, we delve into the innovative application of large language models (LLMs) and
their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) …

See, perceive and answer: A unified benchmark for high-resolution post-disaster evaluation in remote sensing images

D Zhao, J Lu, B Yuan - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org
Visual-language generation for remote sensing image (RSI) is an emerging and challenging
research area that requires multitask learning to achieve a comprehensive understanding …

Regionblip: A unified multi-modal pre-training framework for holistic and regional comprehension

Q Zhou, C Yu, S Zhang, S Wu, Z Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
In this work, we investigate extending the comprehension of Multi-modal Large Language
Models (MLLMs) to regional objects. To this end, we propose to extract features …

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

D Muhtar, Z Li, F Gu, X Zhang, P Xiao - arXiv preprint arXiv:2402.02544, 2024 - arxiv.org
The revolutionary capabilities of large language models (LLMs) have paved the way for
multimodal large language models (MLLMs) and fostered diverse applications across …

Language Integration in Remote Sensing: Tasks, datasets, and future directions

L Bashmal, Y Bazi, F Melgani… - … and Remote Sensing …, 2023 - ieeexplore.ieee.org
The emerging field of vision–language models, which combines computer vision and natural
language processing (NLP), has gained significant interest and exploration. This integration …

Geochat: Grounded large vision-language model for remote sensing

K Kuckreja, MS Danish, M Naseer… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Recent advancements in Large Vision-Language Models (VLMs) have shown great
promise in natural image domains allowing users to hold a dialogue about given visual …

Vision-language models in remote sensing: Current progress and future trends

X Li, C Wen, Y Hu, Z Yuan… - IEEE Geoscience and …, 2024 - ieeexplore.ieee.org
The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-
4) have sparked a wave of interest and research in the field of large language models …

Large language models for captioning and retrieving remote sensing images

JD Silva, J Magalhães, D Tuia, B Martins - arXiv preprint arXiv:2402.06475, 2024 - arxiv.org
Image captioning and cross-modal retrieval are examples of tasks that involve the joint
analysis of visual and linguistic information. In connection to remote sensing imagery, these …

Multimodal arxiv: A dataset for improving scientific comprehension of large vision-language models

L Li, Y Wang, R Xu, P Wang, X Feng, L Kong… - arXiv preprint arXiv …, 2024 - arxiv.org
Large vision-language models (LVLMs), exemplified by GPT-4V, excel across diverse tasks
involving concrete images from natural scenes. However, their ability to interpret abstract …