Semantic Object Accuracy for Generative Text-to-Image Synthesis
October 29, 2019 ยท Entered Twilight ยท ๐ IEEE Transactions on Pattern Analysis and Machine Intelligence
"Last commit was 5.0 years ago (โฅ5 year threshold)"
Evidence collected by the PWNC Scanner
Repo contents: .gitignore, LICENSE, OP-GAN, README.md, SOA
Authors
Tobias Hinz, Stefan Heinrich, Stefan Wermter
arXiv ID
1910.13321
Category
cs.CV: Computer Vision
Cross-listed
cs.LG,
cs.NE
Citations
181
Venue
IEEE Transactions on Pattern Analysis and Machine Intelligence
Repository
https://github.com/tohinz/semantic-object-accuracy-for-generative-text-to-image-synthesis
โญ 105
Last Checked
1 month ago
Abstract
Generative adversarial networks conditioned on textual image descriptions are capable of generating realistic-looking images. However, current methods still struggle to generate images based on complex image captions from a heterogeneous domain. Furthermore, quantitatively evaluating these text-to-image models is challenging, as most evaluation metrics only judge image quality but not the conformity between the image and its caption. To address these challenges we introduce a new model that explicitly models individual objects within an image and a new evaluation metric called Semantic Object Accuracy (SOA) that specifically evaluates images given an image caption. The SOA uses a pre-trained object detector to evaluate if a generated image contains objects that are mentioned in the image caption, e.g. whether an image generated from "a car driving down the street" contains a car. We perform a user study comparing several text-to-image models and show that our SOA metric ranks the models the same way as humans, whereas other metrics such as the Inception Score do not. Our evaluation also shows that models which explicitly model objects outperform models which only model global image characteristics.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted