Cover image for Stable Video 3D: Quality Novel View Synthesis and 3D Generation from Single Images

Posted on

Stable Video 3D: Quality Novel View Synthesis and 3D Generation from Single Images

Stable Video 3D introduces significant advancements in 3D generation, particularly in novel view synthesis (NVS). Unlike previous approaches that often grapple with limited perspectives and inconsistencies in outputs, Stable Video 3D is able to deliver coherent views from any given angle with proficient generalization. This capability not only enhances pose-controllability, but also ensures consistent object appearance across multiple views, further improving critical aspects of realistic and accurate 3D generations.
By adapting Stable Video Diffusion image-to-video diffusion model with the addition of camera path conditioning, Stable Video 3D is able to generate multi-view videos of an object. The use of video diffusion models, in contrast to image diffusion models as used in Stable Zero123, provides major benefits in generalization and view-consistency of generated outputs. By further implementing these techniques with disentangled illumination optimization as well as a new masked score distillation sampling loss function, Stable Video 3D is able to reliably output quality 3D meshes from single image inputs.
Stable Video 3D leverages its multi-view consistency to optimize 3D Neural Radiance Fields (NeRF) and mesh representations to improve the quality of 3D meshes generated directly from novel views. Additionally, in order to reduce the issue of baked-in lighting, Stable Video 3D employs a disentangled illumination model that is jointly optimized along with 3D shape and texture.

Discussion (0)