P Li, T Wang, X Zhao, X Xu, M Song - Pattern Recognition, 2025 - Elsevier
Video captioning generate a sentence that describes the video content. Existing methods
always require a number of captions (eg, 10 or 20) per video to train the model, which is …