An End-to-End Deep Learning Approach for Video Captioning Through Mobile Devices

RJ Pezzuto Damaceno, RM Cesar Jr - Iberoamerican Congress on Pattern …, 2023 - Springer
Video captioning is a computer vision task that aims at generating a description for video
content. This can be achieved using deep learning approaches that leverage image and …

A mobile device framework for video captioning using multimodal neural networks

RJP Damaceno, RM Cesar Jr - Anais Estendidos do XXXVI …, 2023 - sol.sbc.org.br
Video captioning is a computer vision task aimed at providing textual descriptions for videos.
There are numerous strategies and datasets that can be employed to create models capable …

[HTML][HTML] Exploring deep learning approaches for video captioning: A comprehensive review

AJ Yousif, MH Al-Jammas - e-Prime-Advances in Electrical Engineering …, 2023 - Elsevier
While humans can easily describe visual data at varying levels of detail, the same task
presents a significant challenge for machines. This challenge becomes even more complex …

Local feature‐based video captioning with multiple classifier and CARU‐attention

SK Im, KH Chan - IET Image Processing, 2024 - Wiley Online Library
Video captioning aims to identify multiple objects and their behaviours in a video event and
generate captions for the current scene. This task aims to generate a detailed description of …

Efficient video captioning with frame similarity-based filtering

E Rashno, F Zulkernine - … Conference on Database and Expert Systems …, 2023 - Springer
Video captioning combines computer vision and Natural Language Processing (NLP) to
perform the challenging task of scene understanding. The rapid advancements in artificial …

An efficient deep learning‐based video captioning framework using multi‐modal features

S Varma, DP James - Expert Systems, 2021 - Wiley Online Library
Visual understanding has become more significant in gathering information in many real‐life
applications. For a human, it is a trivial task to understand the content in a visual, however …

Learning deep spatiotemporal features for video captioning

E Daskalakis, M Tzelepi, A Tefas - Pattern Recognition Letters, 2018 - Elsevier
In this paper, we propose a novel automatic video captioning system which translates videos
to sentences, utilizing a deep neural network that is composed of three building parts of …

Spatio-temporal attention models for grounded video captioning

M Zanfir, E Marinoiu, C Sminchisescu - Computer Vision–ACCV 2016 …, 2017 - Springer
Automatic video captioning is challenging due to the complex interactions in dynamic real
scenes. A comprehensive system would ultimately localize and track the objects, actions …

Video Captioning Using Deep Learning Approach-A Comprehensive Survey

J Jacob, VP Devassia - International Conference on Intelligent Vision and …, 2022 - Springer
Video captioning is a technique used to represent videos using phrases of natural language.
This technique automatically generates phrases to explain the contents of input videos …

Semantic enhanced video captioning with multi-feature fusion

TZ Niu, SS Dong, ZD Chen, X Luo, S Guo… - ACM Transactions on …, 2023 - dl.acm.org
Video captioning aims to automatically describe a video clip with informative sentences. At
present, deep learning-based models have become the mainstream for this task and …