Video captioning with attention-based LSTM and semantic consistency

Y Wang, D Zhang, Y Liu, B Dai, LH Lee - Transportation research part C …, 2019 - Elsevier

Abstract Machine learning (ML) plays the core function to intellectualize the transportation
systems. Recent years have witnessed the advent and prevalence of deep learning which …

被引用次数：302 相关文章所有 5 个版本

[HTML] springer.com

[HTML][HTML] Video description: A comprehensive survey of deep learning approaches

G Rafiq, M Rafiq, GS Choi - Artificial Intelligence Review, 2023 - Springer

Video description refers to understanding visual content and transforming that acquired
understanding into automatic textual narration. It bridges the key AI fields of computer vision …

被引用次数：18 相关文章所有 5 个版本

[PDF] thecvf.com

Vid2seq: Large-scale pretraining of a visual language model for dense video captioning

A Yang, A Nagrani, PH Seo, A Miech… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …

被引用次数：159 相关文章所有 26 个版本

[PDF] arxiv.org

A general survey on attention mechanisms in deep learning

G Brauwers, F Frasincar - IEEE Transactions on Knowledge …, 2021 - ieeexplore.ieee.org

Attention is an important mechanism that can be employed for a variety of deep learning
models across many different domains and tasks. This survey provides an overview of the …

被引用次数：259 相关文章所有 9 个版本

[PDF] thecvf.com

End-to-end dense video captioning with parallel decoding

T Wang, R Zhang, Z Lu, F Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …

被引用次数：174 相关文章所有 6 个版本

Task-adaptive attention for image captioning

C Yan, Y Hao, L Li, J Yin, A Liu, Z Mao… - … on Circuits and …, 2021 - ieeexplore.ieee.org

Attention mechanisms are now widely used in image captioning models. However, most
attention models only focus on visual features. When generating syntax related words, little …

被引用次数：239 相关文章所有 2 个版本

[PDF] arxiv.org

X-llm: Bootstrapping advanced large language models by treating multi-modalities as foreign languages

F Chen, M Han, H Zhao, Q Zhang, J Shi, S Xu… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated remarkable language abilities. GPT-4,
based on advanced LLMs, exhibits extraordinary multimodal capabilities beyond previous …

被引用次数：76 相关文章所有 2 个版本

[PDF] researchgate.net

Attention, please! A survey of neural attention models in deep learning

A de Santana Correia, EL Colombini - Artificial Intelligence Review, 2022 - Springer

In humans, Attention is a core property of all perceptual and cognitive operations. Given our
limited ability to process competing sources, attention mechanisms select, modulate, and …

被引用次数：188 相关文章所有 8 个版本

[PDF] thecvf.com

Mirrorgan: Learning text-to-image generation by redescription

T Qiao, J Zhang, D Xu, D Tao - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Generating an image from a given text description has two goals: visual realism and
semantic consistency. Although significant progress has been made in generating high …

被引用次数：651 相关文章所有 9 个版本

AAP-MIT: Attentive Atrous Pyramid Network and Memory Incorporated Transformer for Multisentence Video Description

J Prudviraj, MI Reddy, C Vishnu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Generating multi-sentence descriptions for video is considered to be the most complex task
in computer vision and natural language understanding due to the intricate nature of video …

被引用次数：83 相关文章所有 4 个版本

高级搜索

QQ 群