Multimodal image synthesis and editing: A survey and taxonomy

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

Deep audio-visual learning: A survey

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer
Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

Street-view image generation from a bird's-eye view layout

A Swerdlow, R Xu, B Zhou - IEEE Robotics and Automation …, 2024 - ieeexplore.ieee.org
Bird's-Eye View (BEV) Perception has received increasing attention in recent years as it
provides a concise and unified spatial representation across views and benefits a diverse …

Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook

SS Baraheem, TN Le, TV Nguyen - Artificial Intelligence Review, 2023 - Springer
Image synthesis is a process of converting the input text, sketch, or other sources, ie, another
image or mask, into an image. It is an important problem in the computer vision field, where it …

OPT: Omni-perception pre-trainer for cross-modal understanding and generation

J Liu, X Zhu, F Liu, L Guo, Z Zhao, M Sun… - arXiv preprint arXiv …, 2021 - arxiv.org
In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal
understanding and generation, by jointly modeling visual, text and audio resources. OPT is …

Conditional frechet inception distance

M Soloveitchik, T Diskin, E Morin, A Wiesel - arXiv preprint arXiv …, 2021 - arxiv.org
We consider distance functions between conditional distributions. We focus on the
Wasserstein metric and its Gaussian case known as the Frechet Inception Distance (FID) …

Vision+ language applications: A survey

Y Zhou, N Shimada - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Text-to-image generation has attracted significant interest from researchers and practitioners
in recent years due to its widespread and diverse applications across various industries …

Semi-supervised reference-based sketch extraction using a contrastive learning framework

CW Seo, A Ashtari, J Noh - ACM Transactions on Graphics (TOG), 2023 - dl.acm.org
Sketches reflect the drawing style of individual artists; therefore, it is important to consider
their unique styles when extracting sketches from color images for various applications …

[PDF][PDF] Optimizing SQL databases for big data workloads: techniques and best practices

A Uzzaman, MMI Jim, N Nishat… - Academic Journal on …, 2024 - researchgate.net
In the modern data-driven landscape, organizations are inundated with massive amounts of
data, necessitating robust and scalable database solutions (Arzamasova et al., 2020). SQL …

A survey on multimodal-guided visual content synthesis

Z Zhang, Z Li, K Wei, S Pan, C Deng - Neurocomputing, 2022 - Elsevier
With the increasing interest in various creative scenes such as social media, film production,
and intelligence courses, people expect to be able to compile rich visual content according …