Memory-attended recurrent network for video captioning

S Jabeen, X Li, MS Amin, O Bourahla, S Li… - ACM Transactions on …, 2023 - dl.acm.org

Deep Learning has implemented a wide range of applications and has become increasingly
popular in recent years. The goal of multimodal deep learning (MMDL) is to create models …

被引用次数：98 相关文章所有 7 个版本

[PDF] arxiv.org

Recent advances and trends in multimodal deep learning: A review

J Summaira, X Li, AM Shoib, S Li, J Abdul - arXiv preprint arXiv …, 2021 - arxiv.org

Deep Learning has implemented a wide range of applications and has become increasingly
popular in recent years. The goal of multimodal deep learning is to create models that can …

被引用次数：90 相关文章所有 2 个版本

[PDF] thecvf.com

Generating diverse and natural 3d human motions from text

C Guo, S Zou, X Zuo, S Wang, W Ji… - Proceedings of the …, 2022 - openaccess.thecvf.com

Automated generation of 3D human motions from text is a challenging problem. The
generated motions are expected to be sufficiently diverse to explore the text-grounded …

被引用次数：490 相关文章所有 6 个版本

[PDF] thecvf.com

Swinbert: End-to-end transformers with sparse attention for video captioning

K Lin, L Li, CC Lin, F Ahmed, Z Gan… - Proceedings of the …, 2022 - openaccess.thecvf.com

The canonical approach to video captioning dictates a caption generation model to learn
from offline-extracted dense video features. These feature extractors usually operate on …

被引用次数：290 相关文章所有 5 个版本

[PDF] thecvf.com

End-to-end dense video captioning with parallel decoding

T Wang, R Zhang, Z Lu, F Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …

被引用次数：211 相关文章所有 6 个版本

[PDF] thecvf.com

Object relational graph with teacher-recommended learning for video captioning

Z Zhang, Y Shi, C Yuan, B Li, P Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com

Taking full advantage of the information from both vision and language is critical for the
video captioning task. Existing models lack adequate visual representation due to the …

被引用次数：366 相关文章所有 8 个版本

AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description

J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …

被引用次数：90 相关文章所有 4 个版本

[PDF] thecvf.com

Spatio-temporal graph for video captioning with knowledge distillation

B Pan, H Cai, DA Huang, KH Lee… - Proceedings of the …, 2020 - openaccess.thecvf.com

Video captioning is a challenging task that requires a deep understanding of visual scenes.
State-of-the-art methods generate captions using either scene-level or object-level …

被引用次数：334 相关文章所有 8 个版本

[PDF] arxiv.org

Contrastive attention for automatic chest x-ray report generation

F Liu, C Yin, X Wu, S Ge, Y Zou, P Zhang… - arXiv preprint arXiv …, 2021 - arxiv.org

Recently, chest X-ray report generation, which aims to automatically generate descriptions
of given chest X-ray images, has received growing research interests. The key challenge of …

被引用次数：187 相关文章所有 6 个版本

[PDF] aaai.org

Semantic grouping network for video captioning

H Ryu, S Kang, H Kang, CD Yoo - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org

This paper considers a video caption generating network referred to as Semantic Grouping
Network (SGN) that attempts (1) to group video frames with discriminating word phrases of …

被引用次数：154 相关文章所有 8 个版本

高级搜索

QQ 群