VideoDistill: Language-aware Vision Distillation for Video Question Answering

B Zou, C Yang, Y Qiao, C Quan, Y Zhao - arXiv preprint arXiv:2404.00973, 2024 - arxiv.org
Significant advancements in video question answering (VideoQA) have been made thanks
to thriving large image-language pretraining frameworks. Although these image-language …

VideoDistill: Language-aware Vision Distillation for Video Question Answering

B Zou, C Yang, Y Qiao, C Quan, Y Zhao - arXiv e-prints, 2024 - ui.adsabs.harvard.edu
Significant advancements in video question answering (VideoQA) have been made thanks
to thriving large image-language pretraining frameworks. Although these image-language …