Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Multimodal c4: An open, billion-scale corpus of images interleaved with text

W Zhu, J Hessel, A Awadalla… - Advances in …, 2024 - proceedings.neurips.cc
In-context vision and language models like Flamingo support arbitrarily interleaved
sequences of images and text as input. This format not only enables few-shot learning via …

A comprehensive survey on techniques to handle face identity threats: challenges and opportunities

MK Rusia, DK Singh - Multimedia Tools and Applications, 2023 - Springer
The human face is considered the prime entity in recognizing a person's identity in our
society. Henceforth, the importance of face recognition systems is growing higher for many …

Hyperextended lightface: A facial attribute analysis framework

SI Serengil, A Ozpinar - 2021 International Conference on …, 2021 - ieeexplore.ieee.org
Facial attribute analysis from facial images has always been a challenging task. Its practical
use cases are very different. This paper mentioned how to build machine learning models …

Generative multiplane images: Making a 2d gan 3d-aware

X Zhao, F Ma, D Güera, Z Ren, AG Schwing… - … on Computer Vision, 2022 - Springer
What is really needed to make an existing 2D GAN 3D-aware? To answer this question, we
modify a classical GAN, ie., StyleGANv2, as little as possible. We find that only two …

Spatial-frequency mutual learning for face super-resolution

C Wang, J Jiang, Z Zhong, X Liu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Face super-resolution (FSR) aims to reconstruct high-resolution (HR) face images from the
low-resolution (LR) ones. With the advent of deep learning, the FSR technique has achieved …

Orthogonal adaptation for modular customization of diffusion models

R Po, G Yang, K Aberman… - Proceedings of the …, 2024 - openaccess.thecvf.com
Customization techniques for text-to-image models have paved the way for a wide range of
previously unattainable applications enabling the generation of specific concepts across …

Towards measuring fairness in ai: the casual conversations dataset

C Hazirbas, J Bitton, B Dolhansky, J Pan… - … and Identity Science, 2021 - ieeexplore.ieee.org
This paper introduces a novel dataset to help researchers evaluate their computer vision
and audio models for accuracy across a diverse set of age, genders, apparent skin tones …

Latentclr: A contrastive learning approach for unsupervised discovery of interpretable directions

OK Yüksel, E Simsar, EG Er… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent research has shown that it is possible to find interpretable directions in the latent
spaces of pre-trained Generative Adversarial Networks (GANs). These directions enable …

Audio-visual speech and gesture recognition by sensors of mobile devices

D Ryumin, D Ivanko, E Ryumina - Sensors, 2023 - mdpi.com
Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable
speech recognition, particularly when audio is corrupted by noise. Additional visual …