Tremendous variation in the scale of people/head size is a critical problem for crowd counting. To improve the scale invariance of feature representation, recent works …
K Zhang, Y Yang, J Yu, H Jiang, J Fan… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
In recent years, the growing demand for medical imaging diagnosis has placed a significant burden on radiologists. As a solution, Medical Vision-Language Pre-training (Med-VLP) …
Real-time semantic segmentation is essential for many practical applications, which utilizes attention-based feature aggregation into lightweight structures to improve accuracy and …
T Chen, W Wang, Z Jiang, R Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video corpus moment retrieval has become a hot topic recently, which aims to localize a consequent video moments highly relevant to the given query language description from …
Y Qin, N Pu, H Wu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Multi-view subspace clustering aims to cluster the data lying in a union of subspaces with low dimensions. The commonly used spectral clustering performs the final clustering based …
Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that has yet to be thoroughly explored in existing research. To address this gap, we present …
Video transformer naturally incurs a heavier computation burden than a static vision transformer, as the former processes T times longer sequence than the latter under the …
Vireo @ TRecViD 2017: Video-to-text, ad-hoc video search and video hyperlinking Page 1 Singapore Management University Institutional Knowledge at Singapore Management University …
Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for various tasks, such as movie parsing and identity-based movie editing. Related methods …