作者
Dong Zhang, Jinhui Tang, Kwang-Ting Cheng
发表日期
2022/10/10
图书
Proceedings of the 30th ACM International Conference on Multimedia
页码范围
2380-2389
简介
Capturing the long-range dependencies has empirically proven to be effective on a wide range of computer vision tasks. The progressive advances on this topic have been made through the employment of the transformer framework with the help of the multi-head attention mechanism. However, the attention-based image patch interaction potentially suffers from problems of redundant interactions of intra-class patches and unoriented interactions of inter-class patches. In this paper, we propose a novel Graph Reasoning Transformer (GReaT) for image parsing to enable image patches to interact following a relation reasoning pattern. Specifically, the linearly embedded image patches are first projected into the graph space, where each node represents the implicit visual center for a cluster of image patches and each edge reflects the relation weight between two adjacent nodes. After that, global relation reasoning is …
引用总数
学术搜索中的文章
D Zhang, J Tang, KT Cheng - Proceedings of the 30th ACM International Conference …, 2022