The image local autoregressive transformer

M Ni, X Li, W Zuo - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Abstract Language-guided image inpainting aims to fill the defective regions of an image
under the guidance of text while keeping the non-defective regions unchanged. However …

被引用次数：27 相关文章所有 7 个版本

Utilizing greedy nature for multimodal conditional image synthesis in transformers

S Su, J Zhu, L Gao, J Song - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org

Multimodal Conditional Image Synthesis (MCIS) aims to generate images according to
different modalities input and their combination, which allows users to describe their …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

Exploring efficient few-shot adaptation for vision transformers

C Xu, S Yang, Y Wang, Z Wang, Y Fu, X Xue - arXiv preprint arXiv …, 2023 - arxiv.org

The task of Few-shot Learning (FSL) aims to do the inference on novel categories containing
only few labeled examples, with the help of knowledge learned from base categories …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

Human motionformer: Transferring human motions with vision transformers

H Liu, X Han, C Jin, L Qian, H Wei, Z Lin… - arXiv preprint arXiv …, 2023 - arxiv.org

Human motion transfer aims to transfer motions from a target dynamic person to a source
static one for motion synthesis. An accurate matching between the source person and the …

被引用次数：10 相关文章所有 5 个版本

Asset: autoregressive semantic scene editing with transformers at high resolutions

D Liu, S Shetty, T Hinz, M Fisher, R Zhang… - ACM Transactions on …, 2022 - dl.acm.org

We present ASSET, a neural architecture for automatically modifying an input high-
resolution image according to a user's edits on its semantic segmentation map. Our …

被引用次数：11 相关文章所有 3 个版本

[PDF] arxiv.org

Edibert, a generative model for image editing

T Issenhuth, U Tanielian, J Mary, D Picard - arXiv preprint arXiv …, 2021 - arxiv.org

Advances in computer vision are pushing the limits of im-age manipulation, with generative
models sampling detailed images on various tasks. However, a specialized model is often …

被引用次数：14 相关文章所有 6 个版本

Elmformer: Efficient raw image restoration with a locally multiplicative transformer

J Ma, S Yan, L Zhang, G Wang, Q Zhang - Proceedings of the 30th ACM …, 2022 - dl.acm.org

In order to get raw images of high quality for downstream Image Signal Process (ISP), in this
paper we present an Efficient Locally Multiplicative Transformer called ELMformer for raw …

被引用次数：10 相关文章所有 4 个版本

[PDF] arxiv.org

ViR: Vision Retention Networks

A Hatamizadeh, M Ranzinger, J Kautz - arXiv preprint arXiv:2310.19731, 2023 - arxiv.org

Vision Transformers (ViTs) have attracted a lot of popularity in recent years, due to their
exceptional capabilities in modeling long-range spatial dependencies and scalability for …

被引用次数：4 相关文章所有 4 个版本

[PDF] thecvf.com

QS-Craft: Learning to Quantize, Scrabble and Craft for Conditional Human Motion Animation

Y Hong, X Qian, S Luo, G Guo… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper studies the task of conditional Human Motion Animation (cHMA). Given a source
image and a driving video, the model should animate the new frame sequence, in which the …

被引用次数：2 相关文章所有 5 个版本

[PDF] arxiv.org

Shaken, and Stirred: Long-Range Dependencies Enable Robust Outlier Detection with PixelCNN++

BM Umapathi, K Chauhan, P Shenoy… - arXiv preprint arXiv …, 2022 - arxiv.org

Reliable outlier detection is critical for real-world deployment of deep learning models.
Although extensively studied, likelihoods produced by deep generative models have been …

高级搜索

QQ 群