On the selection of anchors and targets for video hyperlinking

ZQ Cheng, Q Dai, S Li, T Mitamura… - Proceedings of the 30th …, 2022 - dl.acm.org

Grounded Situation Recognition (GSR) aims to generate structured semantic summaries of
images for" human-like''event understanding. Specifically, GSR task not only detects the …

被引用次数：41 相关文章所有 6 个版本

[PDF] acm.org

Improving the learning of multi-column convolutional neural network for crowd counting

ZQ Cheng, JX Li, Q Dai, X Wu, JY He… - Proceedings of the 27th …, 2019 - dl.acm.org

Tremendous variation in the scale of people/head size is a critical problem for crowd
counting. To improve the scale invariance of feature representation, recent works …

被引用次数：97 相关文章所有 8 个版本

[PDF] arxiv.org

Multi-task paired masking with alignment modeling for medical vision-language pre-training

K Zhang, Y Yang, J Yu, H Jiang, J Fan… - IEEE Transactions …, 2023 - ieeexplore.ieee.org

In recent years, the growing demand for medical imaging diagnosis has placed a significant
burden on radiologists. As a solution, Medical Vision-Language Pre-training (Med-VLP) …

被引用次数：20 相关文章所有 4 个版本

[PDF] openreview.net

Real-time semantic segmentation with parallel multiple views feature augmentation

JJ Qiao, ZQ Cheng, X Wu, W Li, J Zhang - Proceedings of the 30th ACM …, 2022 - dl.acm.org

Real-time semantic segmentation is essential for many practical applications, which utilizes
attention-based feature aggregation into lightweight structures to improve accuracy and …

被引用次数：16 相关文章所有 3 个版本

Cross-Modality Knowledge Calibration Network for Video Corpus Moment Retrieval

T Chen, W Wang, Z Jiang, R Li… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Video corpus moment retrieval has become a hot topic recently, which aims to localize a
consequent video moments highly relevant to the given query language description from …

被引用次数：7 相关文章所有 2 个版本

[PDF] google.com

EDMC: efficient multi-view clustering via cluster and instance space learning

Y Qin, N Pu, H Wu - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org

Multi-view subspace clustering aims to cluster the data lying in a union of subspaces with
low dimensions. The commonly used spectral clustering performs the final clustering based …

被引用次数：13 相关文章

[PDF] arxiv.org

Damo-streamnet: Optimizing streaming perception in autonomous driving

JY He, ZQ Cheng, C Li, W Xiang, B Chen, B Luo… - arXiv preprint arXiv …, 2023 - arxiv.org

Real-time perception, or streaming perception, is a crucial aspect of autonomous driving that
has yet to be thoroughly explored in existing research. To address this gap, we present …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Long-term leap attention, short-term periodic shift for video classification

H Zhang, L Cheng, Y Hao, C Ngo - Proceedings of the 30th acm …, 2022 - dl.acm.org

Video transformer naturally incurs a heavier computation burden than a static vision
transformer, as the former processes T times longer sequence than the latter under the …

被引用次数：14 相关文章所有 4 个版本

[PDF] smu.edu.sg

Vireo@ trecvid 2017: Video-to-text, ad-hoc video search and video hyperlinking

PA Nguyen, Q Li, ZQ Cheng, YJ Lu, H Zhang, X Wu… - 2017 - ink.library.smu.edu.sg

Vireo @ TRecViD 2017: Video-to-text, ad-hoc video search and video hyperlinking Page 1
Singapore Management University Institutional Knowledge at Singapore Management University …

被引用次数：39 相关文章所有 6 个版本

[PDF] arxiv.org

Relation-Aware Distribution Representation Network for Person Clustering with Multiple Modalities

K Liu, S Tang, Z Li, Z Li, L Bai, F Zhu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Person clustering with multi-modal clues, including faces, bodies, and voices, is critical for
various tasks, such as movie parsing and identity-based movie editing. Related methods …

被引用次数：1 相关文章所有 4 个版本

高级搜索

QQ 群