相关文章- 学术资源搜索

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

W Zhang, M Cai, T Zhang, Y Zhuang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Multimodal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …

被引用次数：10 相关文章所有 3 个版本

[PDF] mdpi.com

Rs-llava: A large vision-language model for joint captioning and question answering in remote sensing imagery

Y Bazi, L Bashmal, MM Al Rahhal, R Ricci, F Melgani - Remote Sensing, 2024 - mdpi.com

In this paper, we delve into the innovative application of large language models (LLMs) and
their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) …

被引用次数：4 相关文章所有 4 个版本

See, perceive and answer: A unified benchmark for high-resolution post-disaster evaluation in remote sensing images

D Zhao, J Lu, B Yuan - IEEE Transactions on Geoscience and …, 2024 - ieeexplore.ieee.org

Visual-language generation for remote sensing image (RSI) is an emerging and challenging
research area that requires multitask learning to achieve a comprehensive understanding …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Regionblip: A unified multi-modal pre-training framework for holistic and regional comprehension

Q Zhou, C Yu, S Zhang, S Wu, Z Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

In this work, we investigate extending the comprehension of Multi-modal Large Language
Models (MLLMs) to regional objects. To this end, we propose to extract features …

被引用次数：12 相关文章所有 2 个版本

[PDF] arxiv.org

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

D Muhtar, Z Li, F Gu, X Zhang, P Xiao - arXiv preprint arXiv:2402.02544, 2024 - arxiv.org

The revolutionary capabilities of large language models (LLMs) have paved the way for
multimodal large language models (MLLMs) and fostered diverse applications across …

被引用次数：6 相关文章所有 2 个版本

Language Integration in Remote Sensing: Tasks, datasets, and future directions

L Bashmal, Y Bazi, F Melgani… - … and Remote Sensing …, 2023 - ieeexplore.ieee.org

The emerging field of vision–language models, which combines computer vision and natural
language processing (NLP), has gained significant interest and exploration. This integration …

被引用次数：4 相关文章所有 3 个版本

[PDF] thecvf.com

Geochat: Grounded large vision-language model for remote sensing

K Kuckreja, MS Danish, M Naseer… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Recent advancements in Large Vision-Language Models (VLMs) have shown great
promise in natural image domains allowing users to hold a dialogue about given visual …

被引用次数：25 相关文章所有 3 个版本

[PDF] arxiv.org

Vision-language models in remote sensing: Current progress and future trends

X Li, C Wen, Y Hu, Z Yuan… - IEEE Geoscience and …, 2024 - ieeexplore.ieee.org

The remarkable achievements of ChatGPT and Generative Pre-trained Transformer 4 (GPT-
4) have sparked a wave of interest and research in the field of large language models …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

Large language models for captioning and retrieving remote sensing images

JD Silva, J Magalhães, D Tuia, B Martins - arXiv preprint arXiv:2402.06475, 2024 - arxiv.org

Image captioning and cross-modal retrieval are examples of tasks that involve the joint
analysis of visual and linguistic information. In connection to remote sensing imagery, these …

被引用次数：5 相关文章所有 2 个版本

[PDF] arxiv.org

Multimodal arxiv: A dataset for improving scientific comprehension of large vision-language models

L Li, Y Wang, R Xu, P Wang, X Feng, L Kong… - arXiv preprint arXiv …, 2024 - arxiv.org

Large vision-language models (LVLMs), exemplified by GPT-4V, excel across diverse tasks
involving concrete images from natural scenes. However, their ability to interpret abstract …

被引用次数：4 相关文章所有 2 个版本

高级搜索

QQ 群

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

Rs-llava: A large vision-language model for joint captioning and question answering in remote sensing imagery

See, perceive and answer: A unified benchmark for high-resolution post-disaster evaluation in remote sensing images

Regionblip: A unified multi-modal pre-training framework for holistic and regional comprehension

Lhrs-bot: Empowering remote sensing with vgi-enhanced large multimodal language model

Language Integration in Remote Sensing: Tasks, datasets, and future directions

Geochat: Grounded large vision-language model for remote sensing

Vision-language models in remote sensing: Current progress and future trends

Large language models for captioning and retrieving remote sensing images

Multimodal arxiv: A dataset for improving scientific comprehension of large vision-language models

相关搜索

引用