Gsrformer: Grounded situation recognition transformer with alternate semantic attention refinement

ZQ Cheng, Q Dai, S Li, T Mitamura… - Proceedings of the 30th …, 2022 - dl.acm.org
Grounded Situation Recognition (GSR) aims to generate structured semantic summaries of
images for" human-like''event understanding. Specifically, GSR task not only detects the …

Improving the learning of multi-column convolutional neural network for crowd counting

ZQ Cheng, JX Li, Q Dai, X Wu, JY He… - Proceedings of the 27th …, 2019 - dl.acm.org
Tremendous variation in the scale of people/head size is a critical problem for crowd
counting. To improve the scale invariance of feature representation, recent works …

Multi-task paired masking with alignment modeling for medical vision-language pre-training

K Zhang, Y Yang, J Yu, H Jiang, J Fan… - IEEE Transactions …, 2023 - ieeexplore.ieee.org
In recent years, the growing demand for medical imaging diagnosis has placed a significant
burden on radiologists. As a solution, Medical Vision-Language Pre-training (Med-VLP) …

Real-time semantic segmentation with parallel multiple views feature augmentation

JJ Qiao, ZQ Cheng, X Wu, W Li, J Zhang - Proceedings of the 30th ACM …, 2022 - dl.acm.org
Real-time semantic segmentation is essential for many practical applications, which utilizes
attention-based feature aggregation into lightweight structures to improve accuracy and …

Cross-Modality Knowledge Calibration Network for Video Corpus Moment Retrieval

T Chen, W Wang, Z Jiang, R Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Video corpus moment retrieval has become a hot topic recently, which aims to localize a
consequent video moments highly relevant to the given query language description from …

EDMC: efficient multi-view clustering via cluster and instance space learning

Y Qin, N Pu, H Wu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Multi-view subspace clustering aims to cluster the data lying in a union of subspaces with
low dimensions. The commonly used spectral clustering performs the final clustering based …

Damo-streamnet: Optimizing streaming perception in autonomous driving

JY He, ZQ Cheng, C Li, W Xiang, B Chen, B Luo… - arXiv preprint arXiv …, 2023 - arxiv.org
Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that
has yet to be thoroughly explored in existing research. To address this gap, we present …

Long-term leap attention, short-term periodic shift for video classification

H Zhang, L Cheng, Y Hao, C Ngo - Proceedings of the 30th acm …, 2022 - dl.acm.org
Video transformer naturally incurs a heavier computation burden than a static vision
transformer, as the former processes T times longer sequence than the latter under the …

Vireo@ trecvid 2017: Video-to-text, ad-hoc video search and video hyperlinking

PA Nguyen, Q Li, ZQ Cheng, YJ Lu, H Zhang, X Wu… - 2017 - ink.library.smu.edu.sg
Vireo @ TRecViD 2017: Video-to-text, ad-hoc video search and video hyperlinking Page 1
Singapore Management University Institutional Knowledge at Singapore Management University …

Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

K Liu, S Tang, Z Li, Z Li, L Bai, F Zhu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for
various tasks, such as movie parsing and identity-based movie editing. Related methods …