Controlling Style and Semantics in Weakly-Supervised Image Generation
December 06, 2019 ยท Entered Twilight ยท ๐ European Conference on Computer Vision
"Last commit was 5.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, LICENSE.md, ManipulationDemo.ipynb, README.md, SETUP.md, SketchingDemo.ipynb, data, datasets, fid.py, images, models, options, precompute_captions.py, test.py, tools, train.py, trainers, util
Authors
Dario Pavllo, Aurelien Lucchi, Thomas Hofmann
arXiv ID
1912.03161
Category
cs.CV: Computer Vision
Citations
35
Venue
European Conference on Computer Vision
Repository
https://github.com/dariopavllo/style-semantics
โญ 146
Last Checked
1 month ago
Abstract
We propose a weakly-supervised approach for conditional image generation of complex scenes where a user has fine control over objects appearing in the scene. We exploit sparse semantic maps to control object shapes and classes, as well as textual descriptions or attributes to control both local and global style. In order to condition our model on textual descriptions, we introduce a semantic attention module whose computational cost is independent of the image resolution. To further augment the controllability of the scene, we propose a two-step generation scheme that decomposes background and foreground. The label maps used to train our model are produced by a large-vocabulary object detector, which enables access to unlabeled data and provides structured instance information. In such a setting, we report better FID scores compared to fully-supervised settings where the model is trained on ground-truth semantic maps. We also showcase the ability of our model to manipulate a scene on complex datasets such as COCO and Visual Genome.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted