Semantic Hierarchy Emerges in Deep Generative Representations for Scene Synthesis

November 21, 2019 · Declared Dead · 🏛 International Journal of Computer Vision

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Ceyuan Yang, Yujun Shen, Bolei Zhou arXiv ID 1911.09267 Category cs.CV: Computer Vision Cross-listed cs.GR, cs.LG Citations 209 Venue International Journal of Computer Vision Last Checked 4 months ago

Abstract

Despite the success of Generative Adversarial Networks (GANs) in image synthesis, there lacks enough understanding on what generative models have learned inside the deep generative representations and how photo-realistic images are able to be composed of the layer-wise stochasticity introduced in recent GANs. In this work, we show that highly-structured semantic hierarchy emerges as variation factors from synthesizing scenes from the generative representations in state-of-the-art GAN models, like StyleGAN and BigGAN. By probing the layer-wise representations with a broad set of semantics at different abstraction levels, we are able to quantify the causality between the activations and semantics occurring in the output image. Such a quantification identifies the human-understandable variation factors learned by GANs to compose scenes. The qualitative and quantitative results further suggest that the generative representations learned by the GANs with layer-wise latent codes are specialized to synthesize different hierarchical semantics: the early layers tend to determine the spatial layout and configuration, the middle layers control the categorical objects, and the later layers finally render the scene attributes as well as color scheme. Identifying such a set of manipulatable latent variation factors facilitates semantic scene manipulation.