Y Zhang, Z Zhao, Z Chen, Z Ding, X Yang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in multimodal large language models (MLLMs) have opened new avenues for video understanding. However, achieving high fidelity in zero-shot video tasks …