Deep Learning and its applications have cascaded impactful research and development with a diverse range of modalities present in the real-world data. More recently, this has …
Abstract Vision-Language Pre-training (VLP) models have shown promising capabilities in grounding natural language in image data, facilitating a broad range of cross-modal tasks …
We present a causal inference framework to improve Weakly-Supervised Semantic Segmentation (WSSS). Specifically, we aim to generate better pixel-level pseudo-masks by …
Existing research addresses scene graph generation (SGG)—a critical technology for scene understanding in images—from a detection perspective, ie., objects are detected using …
X Zhu, Z Li, X Wang, X Jiang, P Sun… - … on Knowledge and …, 2022 - ieeexplore.ieee.org
Recent years have witnessed the resurgence of knowledge engineering which is featured by the fast growth of knowledge graphs. However, most of existing knowledge graphs are …
Today's scene graph generation (SGG) task is still far from practical, mainly due to the severe training bias, eg, collapsing diverse" human walk on/sit on/lay on beach" into" human …
R Li, S Zhang, B Wan, X He - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Scene graph generation is an important visual understanding task with a broad range of vision applications. Despite recent tremendous progress, it remains challenging due to the …
Abstract We propose Scene Graph Auto-Encoder (SGAE) that incorporates the language inductive bias into the encoder-decoder image captioning framework for more human-like …
Y Ding, J Yu, B Liu, Y Hu, M Cui… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Abstract Knowledge-based visual question answering requires the ability of associating external knowledge for open-ended cross-modal scene understanding. One limitation of …