Recent advances on loss functions in deep learning for computer vision

Y Tian, D Su, S Lauria, X Liu - Neurocomputing, 2022 - Elsevier
The loss function, also known as cost function, is used for training a neural network or other
machine learning models. Over the past decade, researchers have designed many loss …

End-to-end autonomous driving: Challenges and frontiers

L Chen, P Wu, K Chitta, B Jaeger… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
The autonomous driving community has witnessed a rapid growth in approaches that
embrace an end-to-end algorithm framework, utilizing raw sensor input to generate vehicle …

Diffusiondet: Diffusion model for object detection

S Chen, P Sun, Y Song, P Luo - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
We propose DiffusionDet, a new framework that formulates object detection as a denoising
diffusion process from noisy boxes to object boxes. During the training stage, object boxes …

Enabling resource-efficient aiot system with cross-level optimization: A survey

S Liu, B Guo, C Fang, Z Wang, S Luo… - … Surveys & Tutorials, 2023 - ieeexplore.ieee.org
The emerging field of artificial intelligence of things (AIoT, AI+ IoT) is driven by the
widespread use of intelligent infrastructures and the impressive success of deep learning …

Detecting twenty-thousand classes using image-level supervision

X Zhou, R Girdhar, A Joulin, P Krähenbühl… - European Conference on …, 2022 - Springer
Current object detectors are limited in vocabulary size due to the small scale of detection
datasets. Image classifiers, on the other hand, reason about much larger vocabularies, as …

Deep long-tailed learning: A survey

Y Zhang, B Kang, B Hooi, S Yan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Deep long-tailed learning, one of the most challenging problems in visual recognition, aims
to train well-performing deep models from a large number of images that follow a long-tailed …

Balanced contrastive learning for long-tailed visual recognition

J Zhu, Z Wang, J Chen, YPP Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
Real-world data typically follow a long-tailed distribution, where a few majority categories
occupy most of the data while most minority categories contain a limited number of samples …

Detecting everything in the open world: Towards universal object detection

Z Wang, Y Li, X Chen, SN Lim… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we formally address universal object detection, which aims to detect every
scene and predict every category. The dependence on human annotations, the limited …

Mdetr-modulated detection for end-to-end multi-modal understanding

A Kamath, M Singh, Y LeCun… - Proceedings of the …, 2021 - openaccess.thecvf.com
Multi-modal reasoning systems rely on a pre-trained object detector to extract regions of
interest from the image. However, this crucial module is typically used as a black box …

Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

CY Wu, Y Li, K Mangalam, H Fan… - Proceedings of the …, 2022 - openaccess.thecvf.com
While today's video recognition systems parse snapshots or short clips accurately, they
cannot connect the dots and reason across a longer range of time yet. Most existing video …