Multi-view features and hybrid reward strategies for vatex video captioning challenge 2019

Scalable and accurate self-supervised multimodal representation learning without aligned video and text data

V Lialin, S Rawls, D Chan, S Ghosh… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scaling up weakly-supervised datasets has shown to be highly effective in the image-text
domain and has contributed to most of the recent state-of-the-art computer vision and …

被引用次数：9 相关文章所有 6 个版本

[PDF] arxiv.org

A comprehensive review on recent methods and challenges of video description

A Singh, TD Singh, S Bandyopadhyay - arXiv preprint arXiv:2011.14752, 2020 - arxiv.org

Video description involves the generation of the natural language description of actions,
events, and objects in the video. There are various applications of video description by filling …

被引用次数：7 相关文章所有 3 个版本

[PDF] researchgate.net

Deep learning based video captioning in bengali

AH Raj, A Seum, A Dash, S Islam… - 2021 26th International …, 2021 - ieeexplore.ieee.org

Generating meaningful textual descriptions from visual contents having the context in
consideration is very challenging in terms of Natural Language Processing (NLP) and …

被引用次数：9 相关文章所有 2 个版本

[PDF] uni-augsburg.de

Automatic generation of natural language descriptions of visual data: describing images and videos using recurrent and self-attentive models

P Harzig - 2022 - opus.bibliothek.uni-augsburg.de

Humans are faced with a constant flow of visual stimuli, eg, from the environment or when
looking at social media. In contrast, visually-impaired people are often incapable to perceive …

被引用次数：1 相关文章所有 3 个版本

[PDF] academia.edu

[PDF][PDF] A Comprehensive Review on Recent Methods and Challenges of Video

A SINGH, TD SINGH… - arXiv preprint arXiv …, 2020 - academia.edu

Authors' address: Alok Singh, alok_rs@ cse. nits. ac. in; Thoudam Doren Singh, doren@
cse. nits. ac. in; Sivaji Bandyopadhyay, sivaji. cse. ju@ gmail. com, Centre for Natural …

高级搜索

QQ 群