L Li, Z Gan, K Lin, CC Lin, Z Liu, C Liu… - arXiv e-prints, 2022 - ui.adsabs.harvard.edu
Unified vision-language frameworks have greatly advanced in recent years, most of which
adopt an encoder-decoder architecture to unify image-text tasks as sequence-to-sequence …