Flora: dual-frequency loss-compensated real-time monocular 3d video reconstruction

L Wang, Y Gong, Q Wang, K Zhou… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
In this work, we propose a real-time monocular 3D video reconstruction approach named
Flora for reconstructing delicate and complete 3D scenes from RGB video sequences in an …

Addressing optimism bias in sequence modeling for reinforcement learning

AR Villaflor, Z Huang, S Pande… - international …, 2022 - proceedings.mlr.press
Impressive results in natural language processing (NLP) based on the Transformer neural
network architecture have inspired researchers to explore viewing offline reinforcement …

Attention-based interrelation modeling for explainable automated driving

Z Zhang, R Tian, R Sherony… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Automated driving desires better performance on tasks like motion planning and interacting
with pedestrians in mixed-traffic environments. Deep learning algorithms can achieve high …

Transformers in 3d point clouds: A survey

D Lu, Q Xie, M Wei, K Gao, L Xu, J Li - arXiv preprint arXiv:2205.07417, 2022 - arxiv.org
Transformers have been at the heart of the Natural Language Processing (NLP) and
Computer Vision (CV) revolutions. The significant success in NLP and CV inspired exploring …

Multi-modal policy fusion for end-to-end autonomous driving

Z Huang, S Sun, J Zhao, L Mao - Information Fusion, 2023 - Elsevier
Multi-modal learning has made impressive progress in autonomous driving by leveraging
information from multiple sensors. Existing feature fusion methods make decisions by …

POCE: Primal Policy Optimization with Conservative Estimation for Multi-constraint Offline Reinforcement Learning

J Guan, L Shen, A Zhou, L Li, H Hu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Multi-constraint offline reinforcement learning (RL) promises to learn policies that satisfy
both cumulative and state-wise costs from offline datasets. This arrangement provides an …

Drivelm: Driving with graph visual question answering

C Sima, K Renz, K Chitta, L Chen, H Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
We study how vision-language models (VLMs) trained on web-scale data can be integrated
into end-to-end driving systems to boost generalization and enable interactivity with human …

Glass segmentation with RGB-thermal image pairs

D Huo, J Wang, Y Qian, YH Yang - IEEE Transactions on Image …, 2023 - ieeexplore.ieee.org
This paper proposes a new glass segmentation method utilizing paired RGB and thermal
images. Due to the large difference between the transmission property of visible light and …

Lift: Learning 4d lidar image fusion transformer for 3d object detection

Y Zeng, D Zhang, C Wang, Z Miao… - Proceedings of the …, 2022 - openaccess.thecvf.com
LiDAR and camera are two common sensors to collect data in time for 3D object detection
under the autonomous driving context. Though the complementary information across …

A comparative review on multi-modal sensors fusion based on deep learning

Q Tang, J Liang, F Zhu - Signal Processing, 2023 - Elsevier
The wide deployment of multi-modal sensors in various areas generates vast amounts of
data with characteristics of high volume, wide variety, and high integrity. However, traditional …