Multi-level visual representation with semantic-reinforced learning for video captioning

M Abdar, M Kollati, S Kuraparthi… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Video captioning (VC) is a fast-moving, cross-disciplinary area of research that comprises
contributions from domains such as computer vision, natural language processing …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Video captioning: a comparative review of where we are and which could be the route

D Moctezuma, T Ramírez-delReal, G Ruiz… - Computer Vision and …, 2023 - Elsevier

Video captioning is the process of describing the content of a sequence of images capturing
its semantic relationships and meanings. Dealing with this task with a single image is …

被引用次数：13 相关文章所有 4 个版本

[PDF] arxiv.org

Pseudo-labeling with keyword refining for few-supervised video captioning

P Li, T Wang, X Zhao, X Xu, M Song - Pattern Recognition, 2025 - Elsevier

Video captioning generate a sentence that describes the video content. Existing methods
always require a number of captions (eg, 10 or 20) per video to train the model, which is …

被引用次数：1 相关文章所有 4 个版本

Time–frequency recurrent transformer with diversity constraint for dense video captioning

P Li, P Zhang, T Wang, H Xiao - Information Processing & Management, 2023 - Elsevier

Describing a long video using multiple sentences, ie, dense video captioning, is a very
challenging task. Existing methods neglect the important fact that the actions of several …

被引用次数：10 相关文章所有 2 个版本

Data-driven personalisation of television content: a survey

L Nixon, J Foss, K Apostolidis, V Mezaris - Multimedia Systems, 2022 - Springer

This survey considers the vision of TV broadcasting where content is personalised and
personalisation is data-driven, looks at the AI and data technologies making this possible …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Chinaopen: A dataset for open-world multimodal learning

A Chen, Z Wang, C Dong, K Tian, R Zhao… - Proceedings of the 31st …, 2023 - dl.acm.org

This paper introduces ChinaOpen, a dataset sourced from Bilibili, a popular Chinese video-
sharing website, for open-world multimodal learning. While the state-of-the-art multimodal …

被引用次数：7 相关文章所有 4 个版本

Side Information Extraction using Bernoulli Distribution based Deep Learning Technique for Video Transmission

G Ajitha, I SanthiPrabha - Journal of Electrical Engineering & Technology, 2024 - Springer

One method for improving the quality of the transmission of video is video encoding. The
video encoding technique must compresses the video files which does not compromise the …

[PDF][PDF] Renmin University of China at TRECVID 2021: Searching and Describing Video

X Li, A Chen, F Hu, X Chen, C Dong, G Yang - www-nlpir.nist.gov

In this paper, we summarize our TRECVID 2021 experiments. We participated in two tasks:
Ad-hoc Video Search (AVS) and Video-to-Text Description Generation (VTT). For the AVS …

高级搜索

QQ 群