A review of multimodal image matching: Methods and applications

X Jiang, J Ma, G Xiao, Z Shao, X Guo - Information Fusion, 2021 - Elsevier
Multimodal image matching, which refers to identifying and then corresponding the same or
similar structure/content from two or more images that are of significant modalities or …

Vision-based robotic grasping from object localization, object pose estimation to grasp estimation for parallel grippers: a review

G Du, K Wang, S Lian, K Zhao - Artificial Intelligence Review, 2021 - Springer
This paper presents a comprehensive survey on vision-based robotic grasping. We
conclude three key tasks during vision-based robotic grasping, which are object localization …

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

Lightglue: Local feature matching at light speed

P Lindenberger, PE Sarlin… - Proceedings of the …, 2023 - openaccess.thecvf.com
We introduce LightGlue, a deep neural network that learns to match local features across
images. We revisit multiple design decisions of SuperGlue, the state of the art in sparse …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Tracking everything everywhere all at once

Q Wang, YY Chang, R Cai, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present a new test-time optimization method for estimating dense and long-range motion
from a video sequence. Prior optical flow or particle video tracking algorithms typically …

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Droid-slam: Deep visual slam for monocular, stereo, and rgb-d cameras

Z Teed, J Deng - Advances in neural information …, 2021 - proceedings.neurips.cc
We introduce DROID-SLAM, a new deep learning based SLAM system. DROID-SLAM
consists of recurrent iterative updates of camera pose and pixelwise depth through a Dense …

Sparf: Neural radiance fields from sparse and noisy poses

P Truong, MJ Rakotosaona… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Neural Radiance Field (NeRF) has recently emerged as a powerful representation
to synthesize photorealistic novel views. While showing impressive performance, it relies on …

LoFTR: Detector-free local feature matching with transformers

J Sun, Z Shen, Y Wang, H Bao… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present a novel method for local image feature matching. Instead of performing image
feature detection, description, and matching sequentially, we propose to first establish pixel …