Multimodal learning with transformers: A survey

P Xu, X Zhu, DA Clifton - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer is a promising neural network learner, and has achieved great success in
various machine learning tasks. Thanks to the recent prevalence of multimodal applications …

Reltr: Relation transformer for scene graph generation

Y Cong, MY Yang, B Rosenhahn - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Different objects in the same scene are more or less related to each other, but only a limited
number of these relationships are noteworthy. Inspired by Detection Transformer, which …

Prototype-based embedding network for scene graph generation

C Zheng, X Lyu, L Gao, B Dai… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Current Scene Graph Generation (SGG) methods explore contextual information to
predict relationships among entity pairs. However, due to the diverse visual appearance of …

[HTML][HTML] Scene graph generation: A comprehensive survey

H Li, G Zhu, L Zhang, Y Jiang, Y Dang, H Hou, P Shen… - Neurocomputing, 2024 - Elsevier
Deep learning techniques have led to remarkable breakthroughs in the field of object
detection and have spawned a lot of scene-understanding tasks in recent years. Scene …

Panoptic video scene graph generation

J Yang, W Peng, X Li, Z Guo, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Towards building comprehensive real-world visual perception systems, we propose and
study a new problem called panoptic scene graph generation (PVSG). PVSG is related to …

Rlipv2: Fast scaling of relational language-image pre-training

H Yuan, S Zhang, X Wang, S Albanie… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Relational Language-Image Pre-training (RLIP) aims to align vision representations
with relational texts, thereby advancing the capability of relational reasoning in computer …

Learning to generate language-supervised and open-vocabulary scene graph using pre-trained visual-semantic space

Y Zhang, Y Pan, T Yao, R Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scene graph generation (SGG) aims to abstract an image into a graph structure, by
representing objects as graph nodes and their relations as labeled edges. However, two …

Relationformer: A Unified Framework for Image-to-Graph Generation

S Shit, R Koner, B Wittmann, J Paetzold, I Ezhov… - … on Computer Vision, 2022 - Springer
A comprehensive representation of an image requires understanding objects and their
mutual relationship, especially in image-to-graph generation, eg, road network extraction …

Unbiased scene graph generation in videos

S Nag, K Min, S Tripathi… - Proceedings of the …, 2023 - openaccess.thecvf.com
The task of dynamic scene graph generation (SGG) from videos is complicated and
challenging due to the inherent dynamics of a scene, temporal fluctuation of model …

4d panoptic scene graph generation

J Yang, J Cen, W Peng, S Liu, F Hong… - Advances in …, 2024 - proceedings.neurips.cc
We are living in a three-dimensional space while moving forward through a fourth
dimension: time. To allow artificial intelligence to develop a comprehensive understanding …