Multi-modal data-efficient 3d scene understanding for autonomous driving

L Kong, X Xu, J Ren, W Zhang, L Pan, K Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Efficient data utilization is crucial for advancing 3D scene understanding in autonomous
driving, where reliance on heavily human-annotated LiDAR point clouds challenges fully …

Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives

S Luo, W Chen, W Tian, R Liu, L Hou… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Foundation models have indeed made a profound impact on various fields, emerging as
pivotal components that significantly shape the capabilities of intelligent systems. In the …

Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability

S Gao, J Yang, L Chen, K Chitta, Y Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org
World models can foresee the outcomes of different actions, which is of paramount
importance for autonomous driving. Nevertheless, existing driving world models still have …

Integration of Mixture of Experts and Multimodal Generative AI in Internet of Vehicles: A Survey

M Xu, D Niyato, J Kang, Z Xiong, A Jamalipour… - arXiv preprint arXiv …, 2024 - arxiv.org
Generative AI (GAI) can enhance the cognitive, reasoning, and planning capabilities of
intelligent modules in the Internet of Vehicles (IoV) by synthesizing augmented datasets …

DriVLMe: Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experiences

Y Huang, J Sansom, Z Ma, F Gervits, J Chai - arXiv preprint arXiv …, 2024 - arxiv.org
Recent advancements in foundation models (FMs) have unlocked new prospects in
autonomous driving, yet the experimental settings of these studies are preliminary, over …

BDC-Occ: Binarized Deep Convolution Unit For Binarized Occupancy Network

Z Zhang, Z Xu, W Yang, Q Liao, JH Xue - arXiv preprint arXiv:2405.17037, 2024 - arxiv.org
Existing 3D occupancy networks demand significant hardware resources, hindering the
deployment of edge devices. Binarized Neural Networks (BNN) offer substantially reduced …

MOSS: Motion-based 3D Clothed Human Synthesis from Monocular Video

H Wang, X Cai, X Sun, J Yue, S Zhang, F Lin… - arXiv preprint arXiv …, 2024 - arxiv.org
Single-view clothed human reconstruction holds a central position in virtual reality
applications, especially in contexts involving intricate human motions. It presents notable …