Structured attentions for visual question answering

Y Li, J Zeng, S Shan, X Chen - IEEE Transactions on Image …, 2018 - ieeexplore.ieee.org

Facial expression recognition in the wild is challenging due to various unconstrained
conditions. Although existing facial expression classifiers have been almost perfect on …

被引用次数：870 相关文章所有 8 个版本

[PDF] arxiv.org

Clevrer: Collision events for video representation and reasoning

K Yi, C Gan, Y Li, P Kohli, J Wu, A Torralba… - arXiv preprint arXiv …, 2019 - arxiv.org

The ability to reason about temporal and causal events from videos lies at the core of human
intelligence. Most video reasoning benchmarks, however, focus on pattern recognition from …

被引用次数：477 相关文章所有 6 个版本

[PDF] neurips.cc

Neural-symbolic vqa: Disentangling reasoning from vision and language understanding

K Yi, J Wu, C Gan, A Torralba, P Kohli… - Advances in neural …, 2018 - proceedings.neurips.cc

We marry two powerful ideas: deep representation learning for visual recognition and
language understanding, and symbolic program execution for reasoning. Our neural …

被引用次数：710 相关文章所有 9 个版本

[PDF] ieee.org

Deep multimodal representation learning: A survey

W Guo, J Wang, S Wang - Ieee Access, 2019 - ieeexplore.ieee.org

Multimodal representation learning, which aims to narrow the heterogeneity gap among
different modalities, plays an indispensable role in the utilization of ubiquitous multimodal …

被引用次数：507 相关文章所有 4 个版本

[PDF] thecvf.com

Camp: Cross-modal adaptive message passing for text-image retrieval

Z Wang, X Liu, H Li, L Sheng, J Yan… - Proceedings of the …, 2019 - openaccess.thecvf.com

Text-image cross-modal retrieval is a challenging task in the field of language and vision.
Most previous approaches independently embed images and sentences into a joint …

被引用次数：356 相关文章所有 8 个版本

[PDF] thecvf.com

Relation-aware graph attention network for visual question answering

L Li, Z Gan, Y Cheng, J Liu - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

In order to answer semantically-complicated questions about an image, a Visual Question
Answering (VQA) model needs to fully understand the visual scene in the image, especially …

被引用次数：416 相关文章所有 8 个版本

[PDF] ieee.org

A survey of vehicle re-identification based on deep learning

H Wang, J Hou, N Chen - IEEE Access, 2019 - ieeexplore.ieee.org

Vehicle re-identification is one of the core technologies of intelligent transportation systems,
and it is crucial for the construction of smart cities. With the rapid development of deep …

被引用次数：67 相关文章所有 2 个版本

[PDF] thecvf.com

Dynamic fusion with intra-and inter-modality attention flow for visual question answering

P Gao, Z Jiang, H You, P Lu… - Proceedings of the …, 2019 - openaccess.thecvf.com

Learning effective fusion of multi-modality features is at the heart of visual question
answering. We propose a novel method of dynamically fuse multi-modal features with intra …

被引用次数：342 相关文章所有 9 个版本

[PDF] thecvf.com

Aware attentive multi-view inference for vehicle re-identification

Y Zhou, L Shao - Proceedings of the IEEE conference on …, 2018 - openaccess.thecvf.com

Vehicle re-identification (re-ID) has the huge potential to contribute to the intelligent video
surveillance. However, it suffers from challenges that different vehicle identities with a similar …

被引用次数：383 相关文章所有 5 个版本

[PDF] thecvf.com

Raven: A dataset for relational and analogical visual reasoning

C Zhang, F Gao, B Jia, Y Zhu… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Dramatic progress has been witnessed in basic vision tasks involving low-level perception,
such as object recognition, detection, and tracking. Unfortunately, there is still enormous …

被引用次数：304 相关文章所有 13 个版本

高级搜索

QQ 群