Drive anywhere: Generalizable end-to-end autonomous driving with multi-modal foundation models

TH Wang, A Maalouf, W Xiao, Y Ban… - … on Robotics and …, 2024 - ieeexplore.ieee.org
As autonomous driving technology matures, end-to-end methodologies have emerged as a
leading strategy, promising seamless integration from perception to control via deep …

PTQ4SAM: Post-Training Quantization for Segment Anything

C Lv, H Chen, J Guo, Y Ding… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Segment Anything Model (SAM) has achieved impressive performance in many
computer vision tasks. However as a large-scale model the immense memory and …

Robohop: Segment-based topological map representation for open-world visual navigation

S Garg, K Rana, M Hosseinzadeh, L Mares… - arXiv preprint arXiv …, 2024 - arxiv.org
Mapping is crucial for spatial reasoning, planning and robot navigation. Existing approaches
range from metric, which require precise geometry-based optimization, to purely topological …

Gaussian Splatting to Real World Flight Navigation Transfer with Liquid Networks

A Quach, M Chahine, A Amini, R Hasani… - arXiv preprint arXiv …, 2024 - arxiv.org
Simulators are powerful tools for autonomous robot learning as they offer scalable data
generation, flexible design, and optimization of trajectories. However, transferring behavior …

Multishot Structured-Light 3-D Scanning for Surfaces in Challenging Motion

M Duan, Y Zheng, Y Jin, J Zheng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Challenging motion, resulting in serious motion artifacts, is a well-known problem in
structured-light (SL) 3-D scanning. Single-shot imaging or tracking interframe offsets …

Probing Multimodal LLMs as World Models for Driving

S Sreeram, TH Wang, A Maalouf, G Rosman… - arXiv preprint arXiv …, 2024 - arxiv.org
We provide a sober look at the application of Multimodal Large Language Models (MLLMs)
within the domain of autonomous driving and challenge/verify some common assumptions …

Multi-Scale and Detail-Enhanced Segment Anything Model for Salient Object Detection

S Gao, P Zhang, T Yan, H Lu - arXiv preprint arXiv:2408.04326, 2024 - arxiv.org
Salient Object Detection (SOD) aims to identify and segment the most prominent objects in
images. Advanced SOD methods often utilize various Convolutional Neural Networks (CNN) …

Watching Swarm Dynamics from Above: A Framework for Advanced Object Tracking in Drone Videos

D Pham, M Hansen, F Dhellemmens, J Krause… - arXiv preprint arXiv …, 2024 - arxiv.org
Easily accessible sensors, like drones with diverse onboard sensors, have greatly expanded
studying animal behavior in natural environments. Yet, analyzing vast, unlabeled video data …

SAMS: One-Shot Learning for the Segment Anything Model Using Similar Images

F Yin, J Li, Y Wei, W Zhang, C Xu - 2024 International Joint …, 2024 - ieeexplore.ieee.org
The field of computer vision is currently transitioning from closed-set to open-set tasks.
Vision foundation models have already demonstrated success in open-set scenarios …

Track Anything Rapter (TAR)

TV Puthanveettil - arXiv preprint arXiv:2405.11655, 2024 - arxiv.org
Object tracking is a fundamental task in computer vision with broad practical applications
across various domains, including traffic monitoring, robotics, and autonomous vehicle …