Multimodal machine learning (MML) is a tempting multidisciplinary research area where heterogeneous data from multiple modalities and machine learning (ML) are combined to …
Multi-modal data abounds in biomedicine, such as radiology images and reports. Interpreting this data at scale is essential for improving clinical care and accelerating clinical …
Despite the remarkable accuracy of deep neural networks in object detection, they are costly to train and scale due to supervision requirements. Particularly, learning more object …
X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Recent years have witnessed the resurgence of knowledge engineering which is featured by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …
Abstract Existing Referring Image Segmentation (RIS) methods typically require expensive pixel-level or box-level annotations for supervision. In this paper, we observe that the …
Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks. We show that phrase grounding can be learned by …
Vision-and-language models trained to match images with text can be combined with visual explanation methods to point to the locations of specific objects in an image. Our work …
MRI brain image analysis, including brain tumor detection, is a challenging task. MRI images are multimodal, and in recent years, multimodal medical image analysis has gotten more …
C Wu, Z Lin, S Cohen, T Bui… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
We consider the problem of segmenting image regions given a natural language phrase, and study it on a novel dataset of 77,262 images and 345,486 phrase-region pairs. Our …