NPGA: Neural Parametric Gaussian Avatars

May 29, 2024 · Declared Dead · 🏛 ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Simon Giebenhain, Tobias Kirschstein, Martin Rünz, Lourdes Agapito, Matthias Nießner arXiv ID 2405.19331 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.GR Citations 37 Venue ACM SIGGRAPH Conference and Exhibition on Computer Graphics and Interactive Techniques in Asia Last Checked 3 months ago

Abstract

The creation of high-fidelity, digital versions of human heads is an important stepping stone in the process of further integrating virtual components into our everyday lives. Constructing such avatars is a challenging research problem, due to a high demand for photo-realism and real-time rendering performance. In this work, we propose Neural Parametric Gaussian Avatars (NPGA), a data-driven approach to create high-fidelity, controllable avatars from multi-view video recordings. We build our method around 3D Gaussian splatting for its highly efficient rendering and to inherit the topological flexibility of point clouds. In contrast to previous work, we condition our avatars' dynamics on the rich expression space of neural parametric head models (NPHM), instead of mesh-based 3DMMs. To this end, we distill the backward deformation field of our underlying NPHM into forward deformations which are compatible with rasterization-based rendering. All remaining fine-scale, expression-dependent details are learned from the multi-view videos. For increased representational capacity of our avatars, we propose per-Gaussian latent features that condition each primitives dynamic behavior. To regularize this increased dynamic expressivity, we propose Laplacian terms on the latent features and predicted dynamics. We evaluate our method on the public NeRSemble dataset, demonstrating that NPGA significantly outperforms the previous state-of-the-art avatars on the self-reenactment task by 2.6 PSNR. Furthermore, we demonstrate accurate animation capabilities from real-world monocular videos.