Multimodal intelligence: Representation learning, information fusion, and applications

C Zhang, Z Yang, X He, L Deng - IEEE Journal of Selected …, 2020 - ieeexplore.ieee.org
Deep learning methods haverevolutionized speech recognition, image recognition, and
natural language processing since 2010. Each of these tasks involves a single modality in …

[HTML][HTML] RS-CLIP: Zero shot remote sensing scene classification via contrastive vision-language supervision

X Li, C Wen, Y Hu, N Zhou - … Journal of Applied Earth Observation and …, 2023 - Elsevier
Zero-shot remote sensing scene classification aims to solve the scene classification problem
on unseen categories and has attracted numerous research attention in the remote sensing …

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …

Learning to prompt for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - International Journal of Computer Vision, 2022 - Springer
Large pre-trained vision-language models like CLIP have shown great potential in learning
representations that are transferable across a wide range of downstream tasks. Different …

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …

TN-ZSTAD: Transferable network for zero-shot temporal activity detection

L Zhang, X Chang, J Liu, M Luo, Z Li… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
An integral part of video analysis and surveillance is temporal activity detection, which
means to simultaneously recognize and localize activities in long untrimmed videos …

A survey of zero-shot learning: Settings, methods, and applications

W Wang, VW Zheng, H Yu, C Miao - ACM Transactions on Intelligent …, 2019 - dl.acm.org
Most machine-learning methods focus on classifying instances whose classes have already
been seen in training. In practice, many applications require classifying instances whose …

f-vaegan-d2: A feature generating framework for any-shot learning

Y Xian, S Sharma, B Schiele… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
When labeled training data is scarce, a promising data augmentation approach is to
generate visual features of unknown classes using their attributes. To learn the class …

Learning to compare: Relation network for few-shot learning

F Sung, Y Yang, L Zhang, T Xiang… - Proceedings of the …, 2018 - openaccess.thecvf.com
We present a conceptually simple, flexible, and general framework for few-shot learning,
where a classifier must learn to recognise new classes given only few examples from each …

Referit3d: Neural listeners for fine-grained 3d object identification in real-world scenes

P Achlioptas, A Abdelreheem, F Xia… - Computer Vision–ECCV …, 2020 - Springer
In this work we study the problem of using referential language to identify common objects in
real-world 3D scenes. We focus on a challenging setup where the referred object belongs to …