We present Stable Video 4D (SV4D), a latent video diffusion model for multi-frame and multi- view consistent dynamic 3D content generation. Unlike previous methods that rely on …
By leveraging the text-to-image diffusion prior, score distillation can synthesize 3D contents without paired text-3D training data. Instead of spending hours of online optimization per text …
Y Chen, J Wang, Z Yang, S Manivasagam… - … on Computer Vision, 2025 - Springer
Large scale 3D scene reconstruction is important for applications such as virtual reality and simulation. Existing neural rendering approaches (eg, NeRF, 3DGS) have achieved realistic …
Z Tang, J Zhang, X Cheng, W Yu, C Feng… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent 3D large reconstruction models typically employ a two-stage process, including first generate multi-view images by a multi-view diffusion model, and then utilize a feed-forward …
C Zhang, H Song, Y Wei, Y Chen, J Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
In this work, we introduce the Geometry-Aware Large Reconstruction Model (GeoLRM), an approach which can predict high-quality assets with 512k Gaussians and 21 input images in …
We introduce MeshAnything V2, an autoregressive transformer that generates Artist-Created Meshes (AM) aligned to given shapes. It can be integrated with various 3D asset production …
Reconstructing and rendering 3D objects from highly sparse views is of critical importance for promoting applications of 3D vision techniques and improving user experience …
The increasing demand for high-quality 3D assets across various industries necessitates efficient and automated 3D content creation. Despite recent advancements in 3D generative …
We propose the Large View Synthesis Model (LVSM), a novel transformer-based approach for scalable and generalizable novel view synthesis from sparse-view inputs. We introduce …