[HTML][HTML] Review of large vision models and visual prompt engineering

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier
Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

A survey on graph neural networks for time series: Forecasting, classification, imputation, and anomaly detection

M Jin, HY Koh, Q Wen, D Zambon, C Alippi… - arXiv preprint arXiv …, 2023 - arxiv.org
Time series are the primary data type used to record dynamic system measurements and
generated in great volume by both physical sensors and online processes (virtual sensors) …

Large selective kernel network for remote sensing object detection

Y Li, Q Hou, Z Zheng, MM Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S Jin, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Ai-generated content (aigc): A survey

J Wu, W Gan, Z Chen, S Wan, H Lin - arXiv preprint arXiv:2304.06632, 2023 - arxiv.org
To address the challenges of digital intelligence in the digital economy, artificial intelligence-
generated content (AIGC) has emerged. AIGC uses artificial intelligence to assist or replace …

Unipad: A universal pre-training paradigm for autonomous driving

H Yang, S Zhang, D Huang, X Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In the context of autonomous driving the significance of effective feature learning is widely
acknowledged. While conventional 3D self-supervised pre-training methods have shown …

Improving zero-shot generalization for clip with synthesized prompts

Z Wang, J Liang, R He, N Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the growing interest in pretrained vision-language models like CLIP, recent research
has focused on adapting these models to downstream tasks. Despite achieving promising …

Ctp: Towards vision-language continual pretraining via compatible momentum contrast and topology preservation

H Zhu, Y Wei, X Liang, C Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Vision-Language Pretraining (VLP) has shown impressive results on diverse
downstream tasks by offline training on large-scale datasets. Regarding the growing nature …

A comprehensive survey on segment anything model for vision and beyond

C Zhang, L Liu, Y Cui, G Huang, W Lin, Y Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the
ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence …

Multimodal large language models: A survey

J Wu, W Gan, Z Chen, S Wan… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
The exploration of multimodal language models integrates multiple data types, such as
images, text, language, audio, and other heterogeneity. While the latest large language …