Cross-Modal and Uni-Modal Soft-Label Alignment for Image-Text Retrieval
March 08, 2024 ยท Entered Twilight ยท ๐ AAAI Conference on Artificial Intelligence
Repo contents: README.md, _doc, clip, configs, dataset, dataset_evalimg, dataset_example, evaluation.py, evaluation_eccv.py, evaluation_img.py, evaluation_sts.py, optim, requirements.txt, retrieval.py, scheduler, unire, utils.py
Authors
Hailang Huang, Zhijie Nie, Ziqiao Wang, Ziyu Shang
arXiv ID
2403.05261
Category
cs.CV: Computer Vision
Cross-listed
cs.MM
Citations
35
Venue
AAAI Conference on Artificial Intelligence
Repository
https://github.com/lerogo/aaai24_itr_cusa
โญ 55
Last Checked
1 month ago
Abstract
Current image-text retrieval methods have demonstrated impressive performance in recent years. However, they still face two problems: the inter-modal matching missing problem and the intra-modal semantic loss problem. These problems can significantly affect the accuracy of image-text retrieval. To address these challenges, we propose a novel method called Cross-modal and Uni-modal Soft-label Alignment (CUSA). Our method leverages the power of uni-modal pre-trained models to provide soft-label supervision signals for the image-text retrieval model. Additionally, we introduce two alignment techniques, Cross-modal Soft-label Alignment (CSA) and Uni-modal Soft-label Alignment (USA), to overcome false negatives and enhance similarity recognition between uni-modal samples. Our method is designed to be plug-and-play, meaning it can be easily applied to existing image-text retrieval models without changing their original architectures. Extensive experiments on various image-text retrieval models and datasets, we demonstrate that our method can consistently improve the performance of image-text retrieval and achieve new state-of-the-art results. Furthermore, our method can also boost the uni-modal retrieval performance of image-text retrieval models, enabling it to achieve universal retrieval. The code and supplementary files can be found at https://github.com/lerogo/aaai24_itr_cusa.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Computer Vision
๐
๐
Old Age
๐
๐
Old Age
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
R.I.P.
๐ป
Ghosted
You Only Look Once: Unified, Real-Time Object Detection
๐
๐
Old Age
SSD: Single Shot MultiBox Detector
๐
๐
Old Age
Squeeze-and-Excitation Networks
R.I.P.
๐ป
Ghosted