Deeply supervised multimodal attentional translation embeddings for visual relationship detection- 学术资源搜索

Deeply supervised multimodal attentional translation embeddings for visual relationship detection

N Gkanatsios, V Pitsikalis, P Koutras… - … on Image Processing …, 2019 - ieeexplore.ieee.org

N Gkanatsios, V Pitsikalis, P Koutras, A Zlatintsi, P Maragos

2019 IEEE International Conference on Image Processing (ICIP), 2019•ieeexplore.ieee.org

Detecting visual relationships, ie<; Subject, Predicate, Object> triplets, has been a
challenging Scene Understanding task approached in the past via linguistic priors or spatial
information in a single feature branch. We introduce a new deeply supervised two-branch
architecture, the Multimodal Attentional Translation Embeddings, where the visual features
of each branch are driven by a multimodal attentional mechanism that exploits spatio-
linguistic similarities in a low-dimensional space. We present a variety of experiments …

Detecting visual relationships, i.e. <; Subject, Predicate, Object> triplets, has been a challenging Scene Understanding task approached in the past via linguistic priors or spatial information in a single feature branch. We introduce a new deeply supervised two-branch architecture, the Multimodal Attentional Translation Embeddings, where the visual features of each branch are driven by a multimodal attentional mechanism that exploits spatio-linguistic similarities in a low-dimensional space. We present a variety of experiments comparing against all related approaches in the literature, as well as by re-implementing and fine-tuning several of them. Results on the commonly employed VRD dataset [1] show that the proposed method clearly outperforms all others, while we also justify our claims both quantitatively and qualitatively.

ieeexplore.ieee.org

展开收起

被引用次数：20 相关文章所有 6 个版本

以上显示的是最相近的搜索结果。查看全部搜索结果

高级搜索

QQ 群

Deeply supervised multimodal attentional translation embeddings for visual relationship detection

引用