Generating Easy-to-Understand Referring Expressions for Target Identifications

November 29, 2018 · Entered Twilight · 🏛 arXiv.org

"Last commit was 6.0 years ago (≥5 year threshold)"

Evidence collected by the PWNC Scanner

Repo contents: .gitignore, LICENSE.md, README.md, config.py, demo, eval_comprehension.py, eval_generation.py, misc, models, prepro.py, pyutils, rerank_generated_captions.py, scripts, train.py

Authors Mikihiro Tanaka, Takayuki Itamochi, Kenichi Narioka, Ikuro Sato, Yoshitaka Ushiku, Tatsuya Harada arXiv ID 1811.12104 Category cs.CV: Computer Vision Citations 1 Venue arXiv.org Repository https://github.com/mikittt/easy-to-understand-REG ⭐ 18 Last Checked 2 months ago

Abstract

This paper addresses the generation of referring expressions that not only refer to objects correctly but also let humans find them quickly. As a target becomes relatively less salient, identifying referred objects itself becomes more difficult. However, the existing studies regarded all sentences that refer to objects correctly as equally good, ignoring whether they are easily understood by humans. If the target is not salient, humans utilize relationships with the salient contexts around it to help listeners to comprehend it better. To derive this information from human annotations, our model is designed to extract information from the target and from the environment. Moreover, we regard that sentences that are easily understood are those that are comprehended correctly and quickly by humans. We optimized this by using the time required to locate the referred objects by humans and their accuracies. To evaluate our system, we created a new referring expression dataset whose images were acquired from Grand Theft Auto V (GTA V), limiting targets to persons. Experimental results show the effectiveness of our approach. Our code and dataset are available at https://github.com/mikittt/easy-to-understand-REG.