Focal visual-text attention for memex question answering

S Lu, M Liu, L Yin, Z Yin, X Liu, W Zheng - PeerJ Computer Science, 2023 - peerj.com

Abstract Visual Question Answering (VQA) is a significant cross-disciplinary issue in the
fields of computer vision and natural language processing that requires a computer to output …

被引用次数：179 相关文章所有 8 个版本

[PDF] sciencedirect.com

Multimodal research in vision and language: A review of current and emerging trends

S Uppal, S Bhagat, D Hazarika, N Majumder, S Poria… - Information …, 2022 - Elsevier

Deep Learning and its applications have cascaded impactful research and development
with a diverse range of modalities present in the real-world data. More recently, this has …

被引用次数：96 相关文章所有 5 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：188 相关文章所有 8 个版本

[PDF] thecvf.com

The garden of forking paths: Towards multi-future trajectory prediction

J Liang, L Jiang, K Murphy, T Yu… - Proceedings of the …, 2020 - openaccess.thecvf.com

This paper studies the problem of predicting the distribution over multiple possible future
paths of people as they move through various visual scenes. We make two main …

被引用次数：172 相关文章所有 11 个版本

[PDF] arxiv.org

Entity-enhanced adaptive reconstruction network for weakly supervised referring expression grounding

X Liu, L Li, S Wang, ZJ Zha, Z Li… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Weakly supervised Referring Expression Grounding (REG) aims to ground a particular
target in an image described by a language expression while lacking the correspondence …

被引用次数：33 相关文章所有 6 个版本

Digital twin-driven focal modulation-based convolutional network for intelligent fault diagnosis

S Li, Q Jiang, Y Xu, K Feng, Y Wang, B Sun… - Reliability Engineering & …, 2023 - Elsevier

Rolling bearings are essential components of various rotating machinery and are critical in
ensuring safe and reliable industrial production. Deep learning techniques have …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

SimAug: Learning Robust Representations from Simulation for Trajectory Prediction

J Liang, L Jiang, A Hauptmann - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer

This paper studies the problem of predicting future trajectories of people in unseen cameras
of novel scenarios and views. We approach this problem through the real-data-free setting in …

被引用次数：75 相关文章所有 9 个版本

[PDF] researchgate.net

Reasoning on the relation: Enhancing visual representation for visual question answering and cross-modal retrieval

J Yu, W Zhang, Y Lu, Z Qin, Y Hu… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

Cross-modal analysis has become a promising direction for artificial intelligence. Visual
representation is crucial for various cross-modal analysis tasks that require visual content …

被引用次数：75 相关文章所有 3 个版本

[PDF] arxiv.org

Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：35 相关文章所有 8 个版本

[PDF] github.io

Cross-domain image captioning via cross-modal retrieval and model adaptation

W Zhao, X Wu, J Luo - IEEE Transactions on Image Processing, 2020 - ieeexplore.ieee.org

In recent years, large scale datasets of paired images and sentences have enabled the
remarkable success in automatically generating descriptions for images, namely image …

被引用次数：50 相关文章所有 6 个版本

高级搜索

QQ 群