Real-Time Radiance Fields for Single-Image Portrait View Synthesis

May 03, 2023 · Entered Twilight · 🏛 ACM Transactions on Graphics

"No code URL or promise found in abstract"
"Code repo scraped from project page (backfill)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, Dockerfile, LICENSE.txt, README.md, dataset_tool.py, dnnlib, docs, metrics, pretrained_networks.py, projector.py, run_generator.py, run_metrics.py, run_projector.py, run_training.py, test_nvcc.cu, training

Authors Alex Trevithick, Matthew Chan, Michael Stengel, Eric R. Chan, Chao Liu, Zhiding Yu, Sameh Khamis, Manmohan Chandraker, Ravi Ramamoorthi, Koki Nagano arXiv ID 2305.02310 Category cs.CV: Computer Vision Cross-listed cs.AI, cs.GR, cs.LG Citations 90 Venue ACM Transactions on Graphics Repository https://github.com/NVlabs/stylegan2 ⭐ 11182 Last Checked 14 days ago

Abstract

We present a one-shot method to infer and render a photorealistic 3D representation from a single unposed image (e.g., face portrait) in real-time. Given a single RGB input, our image encoder directly predicts a canonical triplane representation of a neural radiance field for 3D-aware novel view synthesis via volume rendering. Our method is fast (24 fps) on consumer hardware, and produces higher quality results than strong GAN-inversion baselines that require test-time optimization. To train our triplane encoder pipeline, we use only synthetic data, showing how to distill the knowledge from a pretrained 3D GAN into a feedforward encoder. Technical contributions include a Vision Transformer-based triplane encoder, a camera data augmentation strategy, and a well-designed loss function for synthetic data training. We benchmark against the state-of-the-art methods, demonstrating significant improvements in robustness and image quality in challenging real-world settings. We showcase our results on portraits of faces (FFHQ) and cats (AFHQ), but our algorithm can also be applied in the future to other categories with a 3D-aware image generator.