Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Speaker recognition based on deep learning: An overview

Z Bai, XL Zhang - Neural Networks, 2021 - Elsevier
Speaker recognition is a task of identifying persons from their voices. Recently, deep
learning has dramatically revolutionized speaker recognition. However, there is lack of …

Lightglue: Local feature matching at light speed

P Lindenberger, PE Sarlin… - Proceedings of the …, 2023 - openaccess.thecvf.com
We introduce LightGlue, a deep neural network that learns to match local features across
images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse …

Patch-netvlad: Multi-scale fusion of locally-global descriptors for place recognition

S Hausler, S Garg, M Xu, M Milford… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract Visual Place Recognition is a challenging task for robotics and autonomous
systems, which must deal with the twin problems of appearance and viewpoint change in an …

R2former: Unified retrieval and reranking transformer for place recognition

S Zhu, L Yang, C Chen, M Shah… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Visual Place Recognition (VPR) estimates the location of query images by matching
them with images in a reference database. Conventional methods generally adopt …

Object-centric learning with slot attention

F Locatello, D Weissenborn… - Advances in neural …, 2020 - proceedings.neurips.cc
Learning object-centric representations of complex scenes is a promising step towards
enabling efficient abstract reasoning from low-level perceptual features. Yet, most deep …

Rethinking visual geo-localization for large-scale applications

G Berton, C Masone, B Caputo - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Visual Geo-localization (VG) is the task of estimating the position where a given photo was
taken by comparing it with a large database of images of known locations. To investigate …

Clip2video: Mastering video-text retrieval via image clip

H Fang, P Xiong, L Xu, Y Chen - arXiv preprint arXiv:2106.11097, 2021 - arxiv.org
We present CLIP2Video network to transfer the image-language pre-training model to video-
text retrieval in an end-to-end manner. Leading approaches in the domain of video-and …

Deep learning for 3d point clouds: A survey

Y Guo, H Wang, Q Hu, H Liu, L Liu… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Point cloud learning has lately attracted increasing attention due to its wide applications in
many areas, such as computer vision, autonomous driving, and robotics. As a dominating …

Back to the feature: Learning robust camera localization from pixels to pose

PE Sarlin, A Unagar, M Larsson… - Proceedings of the …, 2021 - openaccess.thecvf.com
Camera pose estimation in known scenes is a 3D geometry task recently tackled by multiple
learning algorithms. Many regress precise geometric quantities, like poses or 3D points …