Instantmesh: Efficient 3d mesh generation from a single image with sparse-view large reconstruction models

J Xu, W Cheng, Y Gao, X Wang, S Gao… - arXiv preprint arXiv …, 2024 - arxiv.org
We present InstantMesh, a feed-forward framework for instant 3D mesh generation from a
single image, featuring state-of-the-art generation quality and significant training scalability …

Cycle3d: High-quality and consistent image-to-3d generation via generation-reconstruction cycle

Z Tang, J Zhang, X Cheng, W Yu, C Feng… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent 3D large reconstruction models typically employ a two-stage process, including first
generate multi-view images by a multi-view diffusion model, and then utilize a feed-forward …

Trame: Trajectory-anchored multi-view editing for text-guided 3d gaussian splatting manipulation

C Luo, D Di, X Yang, Y Ma, Z Xue, C Wei… - arXiv preprint arXiv …, 2024 - arxiv.org
Despite significant strides in the field of 3D scene editing, current methods encounter
substantial challenge, particularly in preserving 3D consistency in multi-view editing …

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org
With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

Puppet-master: Scaling interactive video generation as a motion prior for part-level dynamics

R Li, C Zheng, C Rupprecht, A Vedaldi - arXiv preprint arXiv:2408.04631, 2024 - arxiv.org
We present Puppet-Master, an interactive video generative model that can serve as a motion
prior for part-level dynamics. At test time, given a single image and a sparse set of motion …

CamCo: Camera-Controllable 3D-Consistent Image-to-Video Generation

D Xu, W Nie, C Liu, S Liu, J Kautz, Z Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently video diffusion models have emerged as expressive generative tools for high-
quality video content creation readily available to general users. However, these models …

RIGI: Rectifying Image-to-3D Generation Inconsistency via Uncertainty-aware Learning

J Wang, Z Zheng, W Xu, P Liu - arXiv preprint arXiv:2411.18866, 2024 - arxiv.org
Given a single image of a target object, image-to-3D generation aims to reconstruct its
texture and geometric shape. Recent methods often utilize intermediate media, such as multi …

Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers

P Gao, L Zhuo, Z Lin, C Liu, J Chen, R Du, E Xie… - arXiv preprint arXiv …, 2024 - arxiv.org
Sora unveils the potential of scaling Diffusion Transformer for generating photorealistic
images and videos at arbitrary resolutions, aspect ratios, and durations, yet it still lacks …

Ouroboros3D: Image-to-3D Generation via 3D-aware Recursive Diffusion

H Wen, Z Huang, Y Wang, X Chen, Y Qiao… - arXiv preprint arXiv …, 2024 - arxiv.org
Existing single image-to-3D creation methods typically involve a two-stage process, first
generating multi-view images, and then using these images for 3D reconstruction. However …

GAF: Gaussian Avatar Reconstruction from Monocular Videos via Multi-view Diffusion

J Tang, D Davoli, T Kirschstein, L Schoneveld… - arXiv preprint arXiv …, 2024 - arxiv.org
We propose a novel approach for reconstructing animatable 3D Gaussian avatars from
monocular videos captured by commodity devices like smartphones. Photorealistic 3D head …