Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

Improving pretrained language model fine-tuning with noise stability regularization

H Hua, X Li, D Dou, CZ Xu, J Luo - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
The advent of large-scale pretrained language models (PLMs) has contributed greatly to the
progress in natural language processing (NLP). Despite its recent success and wide …

V2xum-llm: Cross-modal video summarization with temporal prompt instruction tuning

H Hua, Y Tang, C Xu, J Luo - arXiv preprint arXiv:2404.12353, 2024 - arxiv.org
Video summarization aims to create short, accurate, and cohesive summaries of longer
videos. Despite the existence of various video summarization datasets, a notable limitation …

Scaling Up Video Summarization Pretraining with Large Language Models

DM Argaw, S Yoon, FC Heilbron… - Proceedings of the …, 2024 - openaccess.thecvf.com
Long-form video content constitutes a significant portion of internet traffic making automated
video summarization an essential research problem. However existing video summarization …

AVicuna: Audio-Visual LLM with Interleaver and Context-Boundary Alignment for Temporal Referential Dialogue

Y Tang, D Shimada, J Bi, C Xu - arXiv preprint arXiv:2403.16276, 2024 - arxiv.org
In everyday communication, humans frequently use speech and gestures to refer to specific
areas or objects, a process known as Referential Dialogue (RD). While prior studies have …

Previously on... From Recaps to Story Summarization

AK Singh, D Srivastava… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
We introduce multimodal story summarization by leveraging TV episode recaps-short video
sequences interweaving key story moments from previous episodes to bring viewers up to …

Emo-Avatar: Efficient Monocular Video Style Avatar through Texture Rendering

P Liu, L Song, D Zhang, H Hua, Y Tang, H Tu… - arXiv preprint arXiv …, 2024 - arxiv.org
Artistic video portrait generation is a significant and sought-after task in the fields of
computer graphics and vision. While various methods have been developed that integrate …

An Empirical Analysis on Large Language Models in Debate Evaluation

X Liu, P Liu, H He - arXiv preprint arXiv:2406.00050, 2024 - arxiv.org
In this study, we investigate the capabilities and inherent biases of advanced large language
models (LLMs) such as GPT-3.5 and GPT-4 in the context of debate evaluation. We discover …

Learning to Evaluate the Artness of AI-generated Images

J Chen, J An, H Lyu, C Kanan… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Assessing the artness of AI-generated images continues to be a challenge within the realm
of image generation. Most existing metrics cannot be used to perform instance-level and …

A Systematic Survey of Text Summarization: From Statistical Methods to Large Language Models

H Zhang, PS Yu, J Zhang - arXiv preprint arXiv:2406.11289, 2024 - arxiv.org
Text summarization research has undergone several significant transformations with the
advent of deep neural networks, pre-trained language models (PLMs), and recent large …