Video understanding with large language models: A survey

Y Tang, J Bi, S Xu, L Song, S Liang, T Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
With the burgeoning growth of online video platforms and the escalating volume of video
content, the demand for proficient video understanding tools has intensified markedly. Given …

Beyond Training: Dynamic Token Merging for Zero-Shot Video Understanding

Y Zhang, Z Zhao, Z Chen, Z Ding, X Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in multimodal large language models (MLLMs) have opened new
avenues for video understanding. However, achieving high fidelity in zero-shot video tasks …