Tomas Jakab*,
Ruining Li*,
Shangzhe Wu,
Christian Rupprecht,
Andrea Vedaldi
University of Oxford
(* equal contribution)
In 3DV 2024
Monocular 3D reconstruction
Input image
3D shape
Textured
Controllable synthesis
Animation
Relighting
Texture swapping
Farm3D learns an articulated object category entirely from "free" virtual supervision from a 2D diffusion-based image generator. We propose a framework that employs an image generator, such as Stable Diffusion, to produce training data for learning a reconstruction network from the ground up. Additionally, the diffusion model is incorporated as a scoring mechanism to further improve learning. Our method yields a monocular reconstruction network capable of generating controllable 3D assets from a single input image, whether real or generated, in a matter of seconds.
We prompt Stable Diffusion for virtual views of an object category that are then used to train a monocular articulated object reconstruction model that factorises the input image of an object instance into articulated shape, appearance (albedo and diffuse and ambient intensities), viewpoint, and light direction. During training, we also sample synthetic instance views that are then "critiqued" by Stable Diffusion to guide the learning.
Farm3D reconstructs the shape of a wide range of categories to their fine details such as legs and ears despite not being trained on any real images.
Our method enables the generation of controllable 3D assets from either a real image or an image synthesised using Stable Diffusion. Once generated, we have the ability to adjust lighting, swap textures between models of the same category, and even animate the shape.
Input image
Animation
Input image
Animation
Input image
Animation
Input image
Relit
Input image
Relit
Input image
Relit
Input image
Texture image
Texture swapping
Input image
Texture image
Texture swapping
We introduce a new 3D Articulated Animals Dataset to directly evaluate the quality of single-view 3D reconstruction of articulated animals. This dataset includes textured 3D meshes of articulated animals, including horses, cows, and sheep, crafted by a professional 3D artist, and is accompanied by realistic articulated poses.
@Article{jakab2023farm3d, title={{Farm3D}: Learning Articulated 3D Animals by Distilling 2D Diffusion}, author={Jakab, Tomas and Li, Ruining and Wu, Shangzhe and Rupprecht, Christian and Vedaldi, Andrea}, journal={arXiv preprint arXiv:2304.10535}, year={2023}, }