Y Yang, W Huang,
Y Wei,
H Peng… - Proceedings of the …, 2023 - openaccess.thecvf.com
In vision-language modeling, image token removal is an efficient augmentation technique to
reduce the cost of encoding image features. The CLIP-style models, however, have been …