Boosting entity-aware image captioning with multi-modal knowledge graph

C Zhang, C Zhang, S Zheng, Y Qiao, C Li… - arXiv preprint arXiv …, 2023 - arxiv.org

As ChatGPT goes viral, generative AI (AIGC, aka AI-generated content) has made headlines
everywhere because of its ability to analyze and create text, images, and beyond. With such …

被引用次数：194 相关文章所有 4 个版本

[PDF] arxiv.org

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org

Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

被引用次数：182 相关文章所有 7 个版本

[PDF] thecvf.com

Text with knowledge graph augmented transformer for video captioning

X Gu, G Chen, Y Wang, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Video captioning aims to describe the content of videos using natural language. Although
significant progress has been made, there is still much room to improve the performance for …

被引用次数：50 相关文章所有 6 个版本

[PDF] neurips.cc

Contrastive language-image pre-training with knowledge graphs

X Pan, T Ye, D Han, S Song… - Advances in Neural …, 2022 - proceedings.neurips.cc

Recent years have witnessed the fast development of large-scale pre-training frameworks
that can extract multi-modal representations in a unified form and achieve promising …

被引用次数：42 相关文章所有 6 个版本

[PDF] thecvf.com

Ei-clip: Entity-aware interventional contrastive learning for e-commerce cross-modal retrieval

H Ma, H Zhao, Z Lin, A Kale, Z Wang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract recommendation, and marketing services. Extensive efforts have been made to
conquer the cross-modal retrieval problem in the general domain. When it comes to E …

被引用次数：63 相关文章所有 3 个版本

[PDF] aaai.org

Multi-modal knowledge hypergraph for diverse image retrieval

Y Zeng, Q Jin, T Bao, W Li - Proceedings of the AAAI Conference on …, 2023 - ojs.aaai.org

The task of keyword-based diverse image retrieval has received considerable attention due
to its wide demand in real-world scenarios. Existing methods either rely on a multi-stage re …

被引用次数：18 相关文章所有 2 个版本

Image captioning based on scene graphs: A survey

J Jia, X Ding, S Pang, X Gao, X Xin, R Hu… - Expert Systems with …, 2023 - Elsevier

Although recent developments in deep learning have brought several tasks closer to human
performance, there is still a significant gap between human and machine performance in …

被引用次数：13 相关文章所有 2 个版本

[PDF] mlr.press

MMEL: a joint learning framework for multi-mention entity linking

C Yang, B He, Y Wu, C Xing, L He… - Uncertainty in Artificial …, 2023 - proceedings.mlr.press

Entity linking, bridging mentions in the contexts with their corresponding entities in the
knowledge bases, has attracted wide attention due to many potential applications. Recently …

被引用次数：11 相关文章所有 5 个版本

Neural entity alignment with cross-modal supervision

F Su, C Xu, H Yang, Z Chen, N Jing - Information Processing & …, 2023 - Elsevier

The majority of currently available entity alignment (EA) solutions primarily rely on structural
information to align entities, which is biased and disregards additional multi-source …

被引用次数：10 相关文章所有 2 个版本

高级搜索

QQ 群