A comprehensive survey of scene graphs: Generation and application

X Chang, P Ren, P Xu, Z Li, X Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Scene graph is a structured representation of a scene that can clearly express the objects,
attributes, and relationships between objects in the scene. As computer vision technology …

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier
Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

[HTML][HTML] Cpt: Colorful prompt tuning for pre-trained vision-language models

Y Yao, A Zhang, Z Zhang, Z Liu, TS Chua, M Sun - AI Open, 2024 - Elsevier
Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in
grounding natural language in image data, facilitating a broad range of cross-modal tasks …

Causal intervention for weakly-supervised semantic segmentation

D Zhang, H Zhang, J Tang… - Advances in Neural …, 2020 - proceedings.neurips.cc
We present a causal inference framework to improve Weakly-Supervised Semantic
Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by …

Panoptic scene graph generation

J Yang, YZ Ang, Z Guo, K Zhou, W Zhang… - European Conference on …, 2022 - Springer
Existing research addresses scene graph generation (SGG)—a critical technology for scene
understanding in images—from a detection perspective, ie., objects are detected using …

Multi-modal knowledge graph construction and application: A survey

X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Recent years have witnessed the resurgence of knowledge engineering which is featured
by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …

Unbiased scene graph generation from biased training

K Tang, Y Niu, J Huang, J Shi… - Proceedings of the …, 2020 - openaccess.thecvf.com
Today's scene graph generation (SGG) task is still far from practical, mainly due to the
severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …

Bipartite graph network with adaptive message passing for unbiased scene graph generation

R Li, S Zhang, B Wan, X He - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Scene graph generation is an important visual understanding task with a broad range of
vision applications. Despite recent tremendous progress, it remains challenging due to the …

Auto-encoding scene graphs for image captioning

X Yang, K Tang, H Zhang, J Cai - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Abstract We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language
inductive bias into the encoder-decoder image captioning framework for more human-like …

Mukea: Multimodal knowledge extraction and accumulation for knowledge-based visual question answering

Y Ding, J Yu, B Liu, Y Hu, M Cui… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Abstract Knowledge-based visual question answering requires the ability of associating
external knowledge for open-ended cross-modal scene understanding. One limitation of …