作者
Nikolaos Gkanatsios, Vassilis Pitsikalis, Petros Koutras, Petros Maragos
发表日期
2019
研讨会论文
Proceedings of the IEEE International Conference on Computer Vision Workshops
页码范围
0-0
简介
We find that most Scene Graph Generation approaches suffer from two limitations as they: 1) use generic attention mechanisms and dataset-specific statistics that supersede visual features and 2) treat" no interaction" as an extra, both noisy and dominant, class and prune graph edges manually or applying simple filters. As a result, such approaches do not scale up on different settings and specifications. We propose a three-stage pipeline that employs Multi-Head Attention driven by language and spatial features, Translation Embeddings and Multi-Tasking to detect an interacting pair of objects. Our attentional scheme is able to maximize the visual features' interpretability, as well as to capture the nature of datasets of different scales, while multi-tasking robustly resolves the bias of the background class. We present an experimental overview of the related literature, unveil a multitude of evaluation inconsistencies and provide quantitative and qualitative support with experiments on a variety of datasets, where our approach performs on par or even outperforms current state-of-the-art.
引用总数
20202021202220232024539123
学术搜索中的文章
N Gkanatsios, V Pitsikalis, P Koutras, P Maragos - Proceedings of the IEEE/CVF international conference …, 2019