Sam-clip: Merging vision foundation models towards semantic and spatial understanding

Y Zhang, Z Shen, R Jiao - Computers in Biology and Medicine, 2024 - Elsevier

Due to the inherent flexibility of prompting, foundation models have emerged as the
predominant force in the fields of natural language processing and computer vision. The …

被引用次数：23 相关文章所有 5 个版本

[PDF] arxiv.org

A survey on open-vocabulary detection and segmentation: Past, present, and future

C Zhu, L Chen - IEEE Transactions on Pattern Analysis and …, 2024 - ieeexplore.ieee.org

As the most fundamental scene understanding tasks, object detection and segmentation
have made tremendous progress in deep learning era. Due to the expensive manual …

被引用次数：10 相关文章所有 7 个版本

[PDF] thecvf.com

Alignsam: Aligning segment anything model to open context via reinforcement learning

D Huang, X Xiong, J Ma, J Li, Z Jie… - Proceedings of the …, 2024 - openaccess.thecvf.com

Powered by massive curated training data Segment Anything Model (SAM) has
demonstrated its impressive generalization capabilities in open-world scenarios with the …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Clip as rnn: Segment countless visual concepts without training endeavor

S Sun, R Li, P Torr, X Gu, S Li - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Existing open-vocabulary image segmentation methods require a fine-tuning step on mask
labels and/or image-text datasets. Mask labels are labor-intensive which limits the number of …

被引用次数：6 相关文章所有 4 个版本

[PDF] thecvf.com

Maskclustering: View consensus based mask graph clustering for open-vocabulary 3d instance segmentation

M Yan, J Zhang, Y Zhu, H Wang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Open-vocabulary 3D instance segmentation is cutting-edge for its ability to segment 3D
instances without predefined categories. However progress in 3D lags behind its 2D …

被引用次数：4 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] Generative ai for visualization: State of the art and future directions

Y Ye, J Hao, Y Hou, Z Wang, S Xiao, Y Luo, W Zeng - Visual Informatics, 2024 - Elsevier

Generative AI (GenAI) has witnessed remarkable progress in recent years and
demonstrated impressive performance in various generation tasks in different domains such …

被引用次数：6 相关文章所有 3 个版本

[PDF] arxiv.org

Open-vocabulary SAM: Segment and recognize twenty-thousand classes interactively

H Yuan, X Li, C Zhou, Y Li, K Chen, CC Loy - arXiv preprint arXiv …, 2024 - arxiv.org

The CLIP and Segment Anything Model (SAM) are remarkable vision foundation models
(VFMs). SAM excels in segmentation tasks across diverse domains, while CLIP is renowned …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Foundation models in smart agriculture: Basics, opportunities, and challenges

J Li, M Xu, L Xiang, D Chen, W Zhuang, X Yin… - … and Electronics in …, 2024 - Elsevier

The past decade has witnessed the rapid development and adoption of machine and deep
learning (ML & DL) methodologies in agricultural systems, showcased by great successes in …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

OpenMEDLab: An open-source platform for multi-modality foundation models in medicine

X Wang, X Zhang, G Wang, J He, Z Li, W Zhu… - arXiv preprint arXiv …, 2024 - arxiv.org

The emerging trend of advancing generalist artificial intelligence, such as GPTv4 and
Gemini, has reshaped the landscape of research (academia and industry) in machine …

被引用次数：3 相关文章所有 2 个版本

[PDF] thecvf.com

Building Vision-Language Models on Solid Foundations with Masked Distillation

S Sameni, K Kafle, H Tan… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Recent advancements in Vision-Language Models (VLMs) have marked a
significant leap in bridging the gap between computer vision and natural language …

高级搜索

QQ 群