Towards Annotation-Free Evaluation of Cross-Lingual Image Captioning

December 09, 2020 · Declared Dead · 🏛 ACM Multimedia Asia

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Aozhu Chen, Xinyi Huang, Hailan Lin, Xirong Li arXiv ID 2012.04925 Category cs.CV: Computer Vision Cross-listed cs.MM Citations 5 Venue ACM Multimedia Asia Last Checked 3 months ago

Abstract

Cross-lingual image captioning, with its ability to caption an unlabeled image in a target language other than English, is an emerging topic in the multimedia field. In order to save the precious human resource from re-writing reference sentences per target language, in this paper we make a brave attempt towards annotation-free evaluation of cross-lingual image captioning. Depending on whether we assume the availability of English references, two scenarios are investigated. For the first scenario with the references available, we propose two metrics, i.e., WMDRel and CLinRel. WMDRel measures the semantic relevance between a model-generated caption and machine translation of an English reference using their Word Mover's Distance. By projecting both captions into a deep visual feature space, CLinRel is a visual-oriented cross-lingual relevance measure. As for the second scenario, which has zero reference and is thus more challenging, we propose CMedRel to compute a cross-media relevance between the generated caption and the image content, in the same visual feature space as used by CLinRel. The promising results show high potential of the new metrics for evaluation with no need of references in the target language.