NTIRE 2024 challenge on short-form UGC video quality assessment: Methods and results

X Li, K Yuan, Y Pei, Y Lu, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper reviews the NTIRE 2024 Challenge on Shortform UGC Video Quality
Assessment (S-UGC VQA) where various excellent solutions are submitted and evaluated …

Large selective kernel network for remote sensing object detection

Y Li, Q Hou, Z Zheng, MM Cheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research on remote sensing object detection has largely focused on improving the
representation of oriented bounding boxes but has overlooked the unique prior knowledge …

Yolov9: Learning what you want to learn using programmable gradient information

CY Wang, IH Yeh, HYM Liao - arXiv preprint arXiv:2402.13616, 2024 - arxiv.org
Today's deep learning methods focus on how to design the most appropriate objective
functions so that the prediction results of the model can be closest to the ground truth …

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

UniRepLKNet: A Universal Perception Large-Kernel ConvNet for Audio Video Point Cloud Time-Series and Image Recognition

X Ding, Y Zhang, Y Ge, S Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Large-kernel convolutional neural networks (ConvNets) have recently received extensive
research attention but two unresolved and critical issues demand further investigation. 1) …

Vanillanet: the power of minimalism in deep learning

H Chen, Y Wang, J Guo, D Tao - Advances in Neural …, 2024 - proceedings.neurips.cc
At the heart of foundation models is the philosophy of" more is different", exemplified by the
astonishing success in computer vision and natural language processing. However, the …

Transformers in reinforcement learning: a survey

P Agarwal, AA Rahman, PL St-Charles… - arXiv preprint arXiv …, 2023 - arxiv.org
Transformers have significantly impacted domains like natural language processing,
computer vision, and robotics, where they improve performance compared to other neural …

Probing the 3d awareness of visual foundation models

M El Banani, A Raj, KK Maninis, A Kar… - Proceedings of the …, 2024 - openaccess.thecvf.com
Recent advances in large-scale pretraining have yielded visual foundation models with
strong capabilities. Not only can recent models generalize to arbitrary images for their …

Gold-YOLO: Efficient object detector via gather-and-distribute mechanism

C Wang, W He, Y Nie, J Guo, C Liu… - Advances in Neural …, 2024 - proceedings.neurips.cc
In the past years, YOLO-series models have emerged as the leading approaches in the area
of real-time object detection. Many studies pushed up the baseline to a higher level by …

NTIRE 2023 challenge on stereo image super-resolution: Methods and results

L Wang, Y Guo, Y Wang, J Li, S Gu… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we summarize the 2nd NTIRE challenge on stereo image super-resolution
(SR) with a focus on new solutions and results. The task of the challenge is to super-resolve …