In-context vision and language models like Flamingo support arbitrarily interleaved sequences of images and text as input. This format not only enables few-shot learning via …
The human face is considered the prime entity in recognizing a person's identity in our society. Henceforth, the importance of face recognition systems is growing higher for many …
SI Serengil, A Ozpinar - 2021 International Conference on …, 2021 - ieeexplore.ieee.org
Facial attribute analysis from facial images has always been a challenging task. Its practical use cases are very different. This paper mentioned how to build machine learning models …
What is really needed to make an existing 2D GAN 3D-aware? To answer this question, we modify a classical GAN, ie., StyleGANv2, as little as possible. We find that only two …
C Wang, J Jiang, Z Zhong, X Liu - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Face super-resolution (FSR) aims to reconstruct high-resolution (HR) face images from the low-resolution (LR) ones. With the advent of deep learning, the FSR technique has achieved …
R Po, G Yang, K Aberman… - Proceedings of the …, 2024 - openaccess.thecvf.com
Customization techniques for text-to-image models have paved the way for a wide range of previously unattainable applications enabling the generation of specific concepts across …
This paper introduces a novel dataset to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones …
OK Yüksel, E Simsar, EG Er… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent research has shown that it is possible to find interpretable directions in the latent spaces of pre-trained Generative Adversarial Networks (GANs). These directions enable …
Audio-visual speech recognition (AVSR) is one of the most promising solutions for reliable speech recognition, particularly when audio is corrupted by noise. Additional visual …