Y Liu,
B Wan,
L Ma,
X He - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
Visual grounding, which aims to build a correspondence between visual objects and their
language entities, plays a key role in cross-modal scene understanding. One promising and …