A hierarchical approach for generating descriptive image paragraphs

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

被引用次数：141 相关文章所有 7 个版本

[PDF] arxiv.org

From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org

Connecting Vision and Language plays an essential role in Generative Intelligence. For this
reason, large research efforts have been devoted to image captioning, ie describing images …

被引用次数：291 相关文章所有 11 个版本

[PDF] thecvf.com

Mvbench: A comprehensive multi-modal video understanding benchmark

K Li, Y Wang, Y He, Y Li, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

With the rapid development of Multi-modal Large Language Models (MLLMs) a number of
diagnostic benchmarks have recently emerged to evaluate the comprehension capabilities …

被引用次数：48 相关文章所有 4 个版本

[PDF] thecvf.com

Exploring and distilling posterior and prior knowledge for radiology report generation

F Liu, X Wu, S Ge, W Fan, Y Zou - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Automatically generating radiology reports can improve current clinical practice in diagnostic
radiology. On one hand, it can relieve radiologists from the heavy burden of report writing; …

被引用次数：266 相关文章所有 8 个版本

[PDF] arxiv.org

Dreamllm: Synergistic multimodal comprehension and creation

R Dong, C Han, Y Peng, Z Qi, Z Ge, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents DreamLLM, a learning framework that first achieves versatile
Multimodal Large Language Models (MLLMs) empowered with frequently overlooked …

被引用次数：58 相关文章所有 4 个版本

[HTML] nih.gov

Deep neural network models for computational histopathology: A survey

CL Srinidhi, O Ciga, AL Martel - Medical image analysis, 2021 - Elsevier

Histopathological images contain rich phenotypic information that can be used to monitor
underlying mechanisms contributing to disease progression and patient survival outcomes …

被引用次数：568 相关文章所有 7 个版本

[HTML] acm.org

Kimera: From SLAM to spatial perception with 3D dynamic scene graphs

A Rosinol, A Violette, M Abate… - … Journal of Robotics …, 2021 - journals.sagepub.com

Humans are able to form a complex mental model of the environment they move in. This
mental model captures geometric and semantic aspects of the scene, describes the …

被引用次数：188 相关文章所有 6 个版本

[PDF] arxiv.org

Competence-based multimodal curriculum learning for medical report generation

F Liu, S Ge, Y Zou, X Wu - arXiv preprint arXiv:2206.14579, 2022 - arxiv.org

Medical report generation task, which targets to produce long and coherent descriptions of
medical images, has attracted growing research interests recently. Different from the general …

被引用次数：148 相关文章所有 6 个版本

[PDF] aaai.org

Gaitset: Regarding gait as a set for cross-view gait recognition

H Chao, Y He, J Zhang, J Feng - Proceedings of the AAAI conference on …, 2019 - aaai.org

As a unique biometric feature that can be recognized at a distance, gait has broad
applications in crime prevention, forensic identification and social security. To portray a gait …

被引用次数：542 相关文章所有 9 个版本

[PDF] neurips.cc

Auto-encoding knowledge graph for unsupervised medical report generation

F Liu, C You, X Wu, S Ge, X Sun - Advances in Neural …, 2021 - proceedings.neurips.cc

Medical report generation, which aims to automatically generate a long and coherent report
of a given medical image, has been receiving growing research interests. Existing …

被引用次数：104 相关文章所有 6 个版本

高级搜索

QQ 群