Direct speech-to-image translation

F Zhan, Y Yu, R Wu, J Zhang, S Lu, L Liu… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

As information exists in various modalities in real world, effective interaction and fusion
among multimodal information plays a key role for the creation and perception of multimodal …

被引用次数：267 相关文章所有 11 个版本

[PDF] springer.com

Deep audio-visual learning: A survey

H Zhu, MD Luo, R Wang, AH Zheng, R He - International Journal of …, 2021 - Springer

Audio-visual learning, aimed at exploiting the relationship between audio and visual
modalities, has drawn considerable attention since deep learning started to be used …

被引用次数：191 相关文章所有 12 个版本

[PDF] arxiv.org

Street-view image generation from a bird's-eye view layout

A Swerdlow, R Xu, B Zhou - IEEE Robotics and Automation …, 2024 - ieeexplore.ieee.org

Bird's-Eye View (BEV) Perception has received increasing attention in recent years as it
provides a concise and unified spatial representation across views and benefits a diverse …

被引用次数：51 相关文章所有 3 个版本

[PDF] nsf.gov

Image synthesis: a review of methods, datasets, evaluation metrics, and future outlook

SS Baraheem, TN Le, TV Nguyen - Artificial Intelligence Review, 2023 - Springer

Image synthesis is a process of converting the input text, sketch, or other sources, ie, another
image or mask, into an image. It is an important problem in the computer vision field, where it …

被引用次数：28 相关文章所有 6 个版本

[PDF] arxiv.org

OPT: Omni-perception pre-trainer for cross-modal understanding and generation

J Liu, X Zhu, F Liu, L Guo, Z Zhao, M Sun… - arXiv preprint arXiv …, 2021 - arxiv.org

In this paper, we propose an Omni-perception Pre-Trainer (OPT) for cross-modal
understanding and generation, by jointly modeling visual, text and audio resources. OPT is …

被引用次数：49 相关文章所有 2 个版本

[PDF] arxiv.org

Conditional frechet inception distance

M Soloveitchik, T Diskin, E Morin, A Wiesel - arXiv preprint arXiv …, 2021 - arxiv.org

We consider distance functions between conditional distributions. We focus on the
Wasserstein metric and its Gaussian case known as the Frechet Inception Distance (FID) …

被引用次数：46 相关文章所有 2 个版本

[PDF] thecvf.com

Vision+ language applications: A survey

Y Zhou, N Shimada - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Text-to-image generation has attracted significant interest from researchers and practitioners
in recent years due to its widespread and diverse applications across various industries …

被引用次数：10 相关文章所有 6 个版本

[PDF] 3dvar.com

Semi-supervised reference-based sketch extraction using a contrastive learning framework

CW Seo, A Ashtari, J Noh - ACM Transactions on Graphics (TOG), 2023 - dl.acm.org

Sketches reflect the drawing style of individual artists; therefore, it is important to consider
their unique styles when extracting sketches from color images for various applications …

被引用次数：8 相关文章所有 4 个版本

[PDF] researchgate.net

[PDF][PDF] Optimizing SQL databases for big data workloads: techniques and best practices

A Uzzaman, MMI Jim, N Nishat… - Academic Journal on …, 2024 - researchgate.net

In the modern data-driven landscape, organizations are inundated with massive amounts of
data, necessitating robust and scalable database solutions (Arzamasova et al., 2020). SQL …

被引用次数：12 相关文章

[PDF] xidian.edu.cn

A survey on multimodal-guided visual content synthesis

Z Zhang, Z Li, K Wei, S Pan, C Deng - Neurocomputing, 2022 - Elsevier

With the increasing interest in various creative scenes such as social media, film production,
and intelligence courses, people expect to be able to compile rich visual content according …

被引用次数：12 相关文章所有 3 个版本

高级搜索

QQ 群