Y Li, H Fan, R Hu, C Feichtenhofer, K He - arXiv preprint arXiv:2212.00794, 2022 - arxiv.org
We present Fast Language-Image Pre-training (FLIP), a simple and more efficient method
for training CLIP. Our method randomly masks out and removes a large portion of image …