A survey of multi-modal knowledge graphs: Technologies and trends

W Liang, PD Meo, Y Tang, J Zhu - ACM Computing Surveys, 2024 - dl.acm.org
In recent years, Knowledge Graphs (KGs) have played a crucial role in the development of
advanced knowledge-intensive applications, such as recommender systems and semantic …

A systematic literature review on multimodal machine learning: Applications, challenges, gaps and future directions

A Barua, MU Ahmed, S Begum - IEEE Access, 2023 - ieeexplore.ieee.org
Multimodal machine learning (MML) is a tempting multidisciplinary research area where
heterogeneous data from multiple modalities and machine learning (ML) are combined to …

Making the most of text semantics to improve biomedical vision–language processing

B Boecking, N Usuyama, S Bannur, DC Castro… - European conference on …, 2022 - Springer
Multi-modal data abounds in biomedicine, such as radiology images and reports.
Interpreting this data at scale is essential for improving clinical care and accelerating clinical …

Open-vocabulary object detection using captions

A Zareian, KD Rosa, DH Hu… - Proceedings of the …, 2021 - openaccess.thecvf.com
Despite the remarkable accuracy of deep neural networks in object detection, they are costly
to train and scale due to supervision requirements. Particularly, learning more object …

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

Referring image segmentation using text supervision

F Liu, Y Liu, Y Kong, K Xu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Existing Referring Image Segmentation (RIS) methods typically require expensive
pixel-level or box-level annotations for supervision. In this paper, we observe that the …

Contrastive learning for weakly supervised phrase grounding

T Gupta, A Vahdat, G Chechik, X Yang, J Kautz… - … on Computer Vision, 2020 - Springer
Phrase grounding, the problem of associating image regions to caption words, is a crucial
component of vision-language tasks. We show that phrase grounding can be learned by …

Improved Visual Grounding through Self-Consistent Explanations

R He, P Cascante-Bonilla, Z Yang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-and-language models trained to match images with text can be combined with visual
explanation methods to point to the locations of specific objects in an image. Our work …

Multimodal brain tumor detection using multimodal deep transfer learning

P Razzaghi, K Abbasi, M Shirazi, S Rashidi - Applied Soft Computing, 2022 - Elsevier
MRI brain image analysis, including brain tumor detection, is a challenging task. MRI images
are multimodal, and in recent years, multimodal medical image analysis has gotten more …

Phrasecut: Language-based image segmentation in the wild

C Wu, Z Lin, S Cohen, T Bui… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
We consider the problem of segmenting image regions given a natural language phrase,
and study it on a novel dataset of 77,262 images and 345,486 phrase-region pairs. Our …