Collaborative Annotation of Semantic Objects in Images with Multi-granularity Supervisions

June 27, 2018 · Declared Dead · 🏛 ACM Multimedia

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Lishi Zhang, Chenghan Fu, Jia Li arXiv ID 1806.10269 Category cs.CV: Computer Vision Citations 8 Venue ACM Multimedia Last Checked 3 months ago

Abstract

Per-pixel masks of semantic objects are very useful in many applications, which, however, are tedious to be annotated. In this paper, we propose a human-agent collaborative annotation approach that can efficiently generate per-pixel masks of semantic objects in tagged images with multi-granularity supervisions. Given a set of tagged image, a computer agent is first dynamically generated to roughly localize the semantic objects described by the tag. The agent first extracts massive object proposals from an image and then infer the tag-related ones under the weak and strong supervisions from linguistically and visually similar images and previously annotated object masks. By representing such supervisions by over-complete dictionaries, the tag-related object proposals can pop-out according to their sparse coding length, which are then converted to superpixels with binary labels. After that, human annotators participate in the annotation process by flipping labels and dividing superpixels with mouse clicks, which are used as click supervisions that teach the agent to recover false positives/negatives in processing images with the same tags. Experimental results show that our approach can facilitate the annotation process and generate object masks that are highly consistent with those generated by the LabelMe toolbox.