Large-scale multi-modal pre-trained models: A comprehensive survey

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier

Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

被引用次数：76 相关文章所有 4 个版本

[PDF] arxiv.org

A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection

M Jin, HY Koh, Q Wen, D Zambon, C Alippi… - arXiv preprint arXiv …, 2023 - arxiv.org

Time series are the primary data type used to record dynamic system measurements and
generated in great volume by both physical sensors and online processes (virtual sensors) …

被引用次数：78 相关文章所有 2 个版本

[PDF] thecvf.com

Large selective kernel network for remote sensing object detection

Y Li, Q Hou, Z Zheng, MM Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …

被引用次数：135 相关文章所有 7 个版本

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S Jin, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

被引用次数：151 相关文章所有 9 个版本

[PDF] arxiv.org

Ai-generated content (aigc): A survey

J Wu, W Gan, Z Chen, S Wan, H Lin - arXiv preprint arXiv:2304.06632, 2023 - arxiv.org

To address the challenges of digital intelligence in the digital economy, artificial intelligence-
generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace …

被引用次数：110 相关文章所有 3 个版本

[PDF] thecvf.com

Unipad: A universal pre-training paradigm for autonomous driving

H Yang, S Zhang, D Huang, X Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In the context of autonomous driving the significance of effective feature learning is widely
acknowledged. While conventional 3D self-supervised pre-training methods have shown …

被引用次数：15 相关文章所有 4 个版本

[PDF] thecvf.com

Improving zero-shot generalization for clip with synthesized prompts

Z Wang, J Liang, R He, N Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

With the growing interest in pretrained vision-language models like CLIP, recent research
has focused on adapting these models to downstream tasks. Despite achieving promising …

被引用次数：24 相关文章所有 3 个版本

[PDF] thecvf.com

Ctp: Towards vision-language continual pretraining via compatible momentum contrast and topology preservation

H Zhu, Y Wei, X Liang, C Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Vision-Language Pretraining (VLP) has shown impressive results on diverse
downstream tasks by offline training on large-scale datasets. Regarding the growing nature …

被引用次数：15 相关文章所有 5 个版本

[PDF] arxiv.org

A comprehensive survey on segment anything model for vision and beyond

C Zhang, L Liu, Y Cui, G Huang, W Lin, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …

被引用次数：48 相关文章所有 2 个版本

[PDF] arxiv.org

Multimodal large language models: A survey

J Wu, W Gan, Z Chen, S Wan… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org

The exploration of multimodal language models integrates multiple data types, such as
images, text, language, audio, and other heterogeneity. While the latest large language …

被引用次数：64 相关文章所有 5 个版本

高级搜索

QQ 群