Predicting deep zero-shot convolutional neural networks using textual descriptions

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org

Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

被引用次数：356 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] RS-CLIP: Zero shot remote sensing scene classification via contrastive vision-language supervision

X Li, C Wen, Y Hu, N Zhou - … Journal of Applied Earth Observation and …, 2023 - Elsevier

Zero-shot remote sensing scene classification aims to solve the scene classification problem
on unseen categories and has attracted numerous research attention in the remote sensing …

被引用次数：19 相关文章所有 3 个版本

[PDF] thecvf.com

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …

被引用次数：1046 相关文章所有 7 个版本

[PDF] arxiv.org

Learning to prompt for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - International Journal of Computer Vision, 2022 - Springer

Large pre-trained vision-language models like CLIP have shown great potential in learning
representations that are transferable across a wide range of downstream tasks. Different …

被引用次数：1630 相关文章所有 10 个版本

[PDF] mlr.press

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …

被引用次数：19345 相关文章所有 20 个版本

TN-ZSTAD: Transferable network for zero-shot temporal activity detection

L Zhang, X Chang, J Liu, M Luo, Z Li… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

An integral part of video analysis and surveillance is temporal activity detection, which
means to simultaneously recognize and localize activities in long untrimmed videos …

被引用次数：105 相关文章所有 6 个版本

A survey of zero-shot learning: Settings, methods, and applications

W Wang, VW Zheng, H Yu, C Miao - ACM Transactions on Intelligent …, 2019 - dl.acm.org

Most machine-learning methods focus on classifying instances whose classes have already
been seen in training. In practice, many applications require classifying instances whose …

被引用次数：656 相关文章所有 2 个版本

[PDF] thecvf.com

f-vaegan-d2: A feature generating framework for any-shot learning

Y Xian, S Sharma, B Schiele… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

When labeled training data is scarce, a promising data augmentation approach is to
generate visual features of unknown classes using their attributes. To learn the class …

被引用次数：562 相关文章所有 13 个版本

[PDF] thecvf.com

Learning to compare: Relation network for few-shot learning

F Sung, Y Yang, L Zhang, T Xiang… - Proceedings of the …, 2018 - openaccess.thecvf.com

We present a conceptually simple, flexible, and general framework for few-shot learning,
where a classifier must learn to recognise new classes given only few examples from each …

被引用次数：4819 相关文章所有 11 个版本

[PDF] academia.edu

Referit3d: Neural listeners for fine-grained 3d object identification in real-world scenes

P Achlioptas, A Abdelreheem, F Xia… - Computer Vision–ECCV …, 2020 - Springer

In this work we study the problem of using referential language to identify common objects in
real-world 3D scenes. We focus on a challenging setup where the referred object belongs to …

被引用次数：222 相关文章所有 6 个版本

高级搜索

QQ 群