To design fast neural networks, many works have been focusing on reducing the number of floating-point operations (FLOPs). We observe that such reduction in FLOPs, however, does …
Compared to the great progress of large-scale vision transformers (ViTs) in recent years, large-scale models based on convolutional neural networks (CNNs) are still in an early …
We launch EVA, a vision-centric foundation model to explore the limits of visual representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …
Driven by improved architectures and better representation learning frameworks, the field of visual recognition has enjoyed rapid modernization and performance boost in the early …
We present SegNeXt, a simple convolutional network architecture for semantic segmentation. Recent transformer-based models have dominated the field of se-mantic …
J Xu, S Liu, A Vahdat, W Byeon… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
CY Wang, A Bochkovskiy… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Real-time object detection is one of the most important research topics in computer vision. As new approaches regarding architecture optimization and training optimization are …
Abstract This work presents Depth Anything a highly practical solution for robust monocular depth estimation. Without pursuing novel technical modules we aim to build a simple yet …