F Zhang, S Qu, F Shi, C Xu - ACM Multimedia 2024 - openreview.net
10 天前 - … explicitly integrate the overlooked local visual representations into the global feature,
… , and conduct contrastive learning between the ˆ𝑇𝑔𝑒𝑛 and the global visual and textual …