Vbench: Comprehensive benchmark suite for video generative models

Z Huang, Y He, J Yu, F Zhang, C Si… - Proceedings of the …, 2024 - openaccess.thecvf.com
Video generation has witnessed significant advancements yet evaluating these models
remains a challenge. A comprehensive evaluation benchmark for video generation is …

Instruct-Imagen: Image generation with multi-modal instruction

H Hu, KCK Chan, YC Su, W Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract This paper presents Instruct-Imagen a model that tackles heterogeneous image
generation tasks and generalizes across unseen tasks. We introduce multi-modal instruction …

Shadows Don't Lie and Lines Can't Bend! Generative Models don't know Projective Geometry... for now

A Sarkar, H Mai, A Mahapatra… - Proceedings of the …, 2024 - openaccess.thecvf.com
Generative models can produce impressively realistic images. This paper demonstrates that
generated images have geometric features different from those of real images. We build a …

Photoswap: Personalized subject swapping in images

J Gu, Y Wang, N Zhao, TJ Fu, W Xiong… - Advances in …, 2024 - proceedings.neurips.cc
In an era where images and visual content dominate our digital landscape, the ability to
manipulate and personalize these images has become a necessity. Envision seamlessly …

Evaluating text-to-visual generation with image-to-text generation

Z Lin, D Pathak, B Li, J Li, X Xia, G Neubig… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite significant progress in generative AI, comprehensive evaluation remains
challenging because of the lack of effective metrics and standardized benchmarks. For …

Lightit: Illumination modeling and control for diffusion models

P Kocsis, J Philip, K Sunkavalli… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce LightIt a method for explicit illumination control for image generation. Recent
generative methods lack lighting control which is crucial to numerous artistic aspects of …

Anyv2v: A plug-and-play framework for any video-to-video editing tasks

M Ku, C Wei, W Ren, H Yang, W Chen - arXiv preprint arXiv:2403.14468, 2024 - arxiv.org
Video-to-video editing involves editing a source video along with additional control (such as
text prompts, subjects, or styles) to generate a new video that aligns with the source video …

Viescore: Towards explainable metrics for conditional image synthesis evaluation

M Ku, D Jiang, C Wei, X Yue, W Chen - arXiv preprint arXiv:2312.14867, 2023 - arxiv.org
In the rapidly advancing field of conditional image generation research, challenges such as
limited explainability lie in effectively evaluating the performance and capabilities of various …

Holistic Evaluation for Interleaved Text-and-Image Generation

M Liu, Z Xu, Z Lin, T Ashby, J Rimchala, J Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
Interleaved text-and-image generation has been an intriguing research direction, where the
models are required to generate both images and text pieces in an arbitrary order. Despite …

LSTP: Language-guided Spatial-Temporal Prompt Learning for Long-form Video-Text Understanding

Y Wang, Y Wang, P Wu, J Liang, D Zhao… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite progress in video-language modeling, the computational challenge of interpreting
long-form videos in response to task-specific linguistic queries persists, largely due to the …